PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.72k stars 209 forks source link

Expand behavior of `std.in` #3461

Open aljazerzen opened 11 months ago

aljazerzen commented 11 months ago

What's up?

std.in currently supports "in range":

from x
filter (y | in 1..5)

We would want to add:

  1. "in array literal":

    from x
    filter (y | in [1, 2, 10, 42])
  2. "in a dynamic array":

    from x
    filter (y | in (from z | select z.some_col))

Point 1 should be easy, point 2 is much harder, as that should translate to column subquery (sublink in Postgres lingo).

max-sixty commented 11 months ago

Great idea!

@snth which was the one that you heard mentioned lots?

I took a look at the code — in didn't move to the prql std lib (i.e. defined in PRQL), so we wouldn't have to hoist it back to the compiler to get (1) working — so it might not be too difficult...

snth commented 11 months ago

Thanks @aljazerzen . It's form 1 that I think is sorely missing. See #993 for previous discussion on this.

In my experience the IN operator is one of the most used ones in SQL. DBeaver will for example reach for it immediately when filtering on a value etc...

I didn't find an example in the dbt docs as quickly as I recalled but here is one in reports/pages/analysis/seasonality-investigation.md:

if(dayname(ordered_at) in ('Sunday', 'Saturday'), 'Weekend', 'Weekday') as day_type,
snth commented 11 months ago

Also previous discussion on Discord: https://discord.com/channels/936728116712316989/1093120488995299378/1093120488995299378

I thought the question had previously also been posed by someone else but couldn't find it now.

snth commented 11 months ago

For form 2, I think it would be ok to support that with s-strings for now since we don't support subqueries (and should think more about whether we should or not).

So that could be

from x
filter (y | in (s"select z.some_col from z"))