Open hsight opened 2 months ago
I really like this idea, it could come in handy!
I wondered if it would be possible to approximate this using a PRQL function; that might allow you to try it out before it's implemented natively within PRQL. I put this function together which works! (sorta... see below)
let sample = func
nrows <float>
tbl <relation>
-> <relation> (
from s"SELECT * FROM {tbl} USING SAMPLE {nrows}"
)
from invoices
sample 20
select { customer_id }
This outputs the following SQL:
WITH table_0 AS (
SELECT
*
FROM
invoices USING SAMPLE 20
)
SELECT
customer_id
FROM
table_0
-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)
However, I start to get strange errors if I modify the pipeline preceding the sample
call. If I move the select { customer_id }
before the sample 20
, I get the error:
from invoices
select { customer_id }
sample 20
Error:
╭─[:9:1]
│
9 │ select { customer_id }
│ ───────────┬──────────
│ ╰──────────── unexpected ``(Select ...)``
│
│ Help: this is probably a 'bad type' error (we are working on that)
───╯
Perhaps I'm using the S-string wrong somehow... does anyone have suggestions?
@kgutwin great use of s-strings!
The one change to fix that error is:
let sample = func
- nrows <float>
+ nrows <int>
tbl <relation>
-> <relation> (
from s"SELECT * FROM {tbl} USING SAMPLE {nrows}"
)
I agree sample
is useful. There are a lot of options for sample
in DuckDB, which makes it a bit harder to add (though still tractable...). Possibly we could add it to the docs as a nice example of s-strings initially.
Would be very open to a PR for either!
We need to do a survey of supported features in these databases and come up with an interface that will allow (eventually) supporting all of those features.
What's up?
It would be great to handle table sampling in PRQL.
Examples: