PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.65k stars 208 forks source link

sqlite query generation REGEXP doesn't work. #4540

Open ryanhamilton opened 1 month ago

ryanhamilton commented 1 month ago

What's up?

By default PRQL seems to generate sqlite queries with REGEXP. This means code like this doesn't work: https://github.com/timeseries/qstudio/issues/50 If where possible PRQL avoided REGEXP and used LIKE many more queries may work out of the box. REGEXP is also risky to use as the implentation depends on the platform so users may get different results.

max-sixty commented 1 month ago

Is there a minimal example? I see the SQL in the linked issue but can't see the PRQL.

richb-hanover commented 1 month ago

https://github.com/timeseries/qstudio/issues/50 is my report. The underlying conundrum is that SQLite does not ship with a built-in REGEXP facility. When qStudio calls prqlc, even if the query has prql target:sql.sqlite, PRQL still generates the call to a REGEX function, which causes the (standard) SQLite to throw the error shown.

There are a few alternatives:

Update: @ryanhamilton has also raised an issue at https://github.com/nalgeon/sqlean/issues/119

max-sixty commented 1 month ago

OK, thanks.

So it sounds the minimal example:

prql target:sql.sqlite

from foo
derive x ~= 'bar$'

...shouldn't generate...

SELECT
  *,
  x REGEXP 'bar$'
FROM
  foo

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

...I do remember seeing some issues that regex isn't natively implemented by sqlite.


One option would be to inspect the regex and generate a LIKE if it's simple enough — i.e. in the case above it could generate a LIKE "%bar". I think that would be a fairly easy algo to write — basically if there are no special characters, just add % to the start & end unless a ^ or $ exist, and send. (Need to think about case etc too)

richb-hanover commented 1 month ago

Let's hold off to see what Ryan discovers. (I'm not in a rush - the like function https://github.com/PRQL/prql/issues/1123 serves my needs now.)

If it turns out to be straightforward to get more addin functions into the SQLite that's bundled into qStudio, that'll be a win for all.

ryanhamilton commented 1 month ago

Sqlean said now. Last chance is for xerial to create a fat jar: https://github.com/xerial/sqlite-jdbc/issues/1125

richb-hanover commented 1 month ago

Thanks for the update

PrettyWood commented 1 month ago

@ryanhamilton @richb-hanover Doesn't text.contains solve your issue https://prql-lang.org/book/reference/stdlib/text.html? Using regexp is expensive and indeed having starts_with, contains and endswith cover the majority of scenarios

richb-hanover commented 1 month ago

@PrettyWood - that appears to be a very good workaround. Here is the use-case that triggered the original report:

qStudio's SQLite implementation doesn't bundle in the REGEXP addin, and therefore gives an error. @ryanhamilton is looking to see if there is another SQLite JDBC that might bundle it.

You are correct that it covers the majority of situations well. I am content to close this issue as "Solved".