Open maelp opened 9 months ago
Right now I seem to be "forced" to use an s-string which is not very neat, eg
from s"SELECT * FROM cse(2024, 1, 4)"
aggregate {
count this
}
or something, is there a better way?
similarly if I have a field with a "reserved name" like "timestamp" I cannot directly use it, I need to use s"timestamp", for instance
aggregate {
min_time = min s"timestamp" -- using "min timestamp" would generate an error because "timestamp" is a type
}
isn't there a way to see that what follows is necessarily a value, and should be coerced to a value?
Hi @maelp , good questions, thanks for opening an issue.
Check out https://prql-lang.org/book/reference/syntax/keywords.html — these are solvable with backticks and this.
respectively:
from `cse(2024,1,5)`
aggregate {
sum this.time
}
SELECT
COALESCE(SUM(time), 0)
FROM
"cse(2024,1,5)"
-- Generated by PRQL compiler version:0.11.1 (https://prql-lang.org)
Does that make sense?
Thanks! However if I use this DuckDB MACRO
CREATE MACRO cse(year, weekA, weekB) AS TABLE SELECT * FROM read_parquet(list_transform(
generate_series(weekA, weekB),
week -> format('s3://${BUCKET_PATH}/year={0}/week={1}/0.parquet', year, week)
));
I think it doesn't create a "table" per-se, but should be used as SELECT * FROM cse(...)
do you think I should instead create a table? not sure if I can do this as a macro / function?
When I use your code I get the error:
Catalog Error: Table with name recent_cse(4) does not exist!
The generated SQL is this
WITH table_0 AS (
SELECT
bms_id,
GREATEST(value.temperature_mosfet, value.temperature_ic) AS _expr_0,
GREATEST(value.temperature_front, value.temperature_back) AS _expr_1,
LEAST(value.temperature_front, value.temperature_back) AS _expr_2
FROM
"recent_cse(4)"
)
SELECT
bms_id,
MIN(_expr_2) AS min_temp_cells,
MAX(_expr_1) AS max_temp_cells,
MAX(_expr_0) AS max_temp_bms,
COUNT(*) AS count
FROM
table_0
WHERE
_expr_2 < -10
OR _expr_1 > 40
OR _expr_0 > 70
GROUP BY
bms_id
I guess it should not use the \" character because I want the macro to be run in duckdb no?
I verified that if I run the same query without the quotes, eg not "cse"
but directly cse
it works
eg
WITH table_0 AS (
SELECT
bms_id,
GREATEST(value.temperature_mosfet, value.temperature_ic) AS _expr_0,
GREATEST(value.temperature_front, value.temperature_back) AS _expr_1,
LEAST(value.temperature_front, value.temperature_back) AS _expr_2
FROM
recent_cse(4)
)
SELECT
bms_id,
MIN(_expr_2) AS min_temp_cells,
MAX(_expr_1) AS max_temp_cells,
MAX(_expr_0) AS max_temp_bms,
COUNT(*) AS count
FROM
table_0
WHERE
_expr_2 < -10
OR _expr_1 > 40
OR _expr_0 > 70
GROUP BY
bms_id
is there a way to generate this simply from PRQL?
And if I try
from s"recent_cse(4)"
derive {
min_temp_cells = s"LEAST(value.temperature_front, value.temperature_back)",
max_temp_cells = s"GREATEST(value.temperature_front, value.temperature_back)",
max_temp_bms = s"GREATEST(value.temperature_mosfet, value.temperature_ic)",
}
filter min_temp_cells < -10 || max_temp_cells > 40 || max_temp_bms > 70
group bms_id (
aggregate {
min_temp_cells = min min_temp_cells,
max_temp_cells = max max_temp_cells,
max_temp_bms = max max_temp_bms,
count = count this,
}
)```
it says
s-strings representing a table must start with SELECT
Ah, so we want literally FROM cse(2024,1,5)
and not the quoted FROM "cse(2024,1,5)"
.
In that case, I think we do need to use the from s"SELECT * FROM cse(2024, 1, 4)"
, since currently PRQL expects an identifier.
We do have prior art for something more with the read_parquet
-like functions; so it's not out of the question to do something similar.
This probably isn't going to make this perfect immediately, but it would be possible to make a function so the inline call were elegant:
prql target:sql.duckdb
let cse = x y z -> s"select * from cse({x},{y},{z})"
from (cse 1 2 3)
derive x = 5
WITH table_0 AS (
SELECT
*
from
cse(1, 2, 3)
)
SELECT
*,
5 AS x
FROM
table_0
-- Generated by PRQL compiler version:0.11.1 (https://prql-lang.org)
would be interesting if we could have something like this! Or allow macros and duckdb functions in PRQL or something
BTW is it possible to do a multi-line PRQL function (this does not seem to be shown in the documentation)
I'm trying to do a helper function, then call it like this
let checkTemps = numWeeks minCellTemp maxCellTemp maxBMSTemp -> (from (recent_cse numWeeks)
derive {
min_temp_cells = s"LEAST(value.temperature_front, value.temperature_back)",
max_temp_cells = s"GREATEST(value.temperature_front, value.temperature_back)",
max_temp_bms = s"GREATEST(value.temperature_mosfet, value.temperature_ic)",
}
filter min_temp_cells < minCellTemp || max_temp_cells > maxCellTemp || max_temp_bms > maxBMSTemp
group bms_id (
aggregate {
min_temp_cells = min min_temp_cells,
max_temp_cells = max max_temp_cells,
max_temp_bms = max max_temp_bms,
count = count this,
}
)
)
checkTemps 4 -10 40 70
but it doesn't work...
The -10
needs to be parenthesized...
let checkTemps = numWeeks minCellTemp maxCellTemp maxBMSTemp -> (
from (recent_cse)
derive {
min_temp_cells = s"LEAST(value.temperature_front, value.temperature_back)",
max_temp_cells = s"GREATEST(value.temperature_front, value.temperature_back)",
max_temp_bms = s"GREATEST(value.temperature_mosfet, value.temperature_ic)",
}
filter min_temp_cells < minCellTemp || max_temp_cells > maxCellTemp || max_temp_bms > maxBMSTemp
group bms_id (
aggregate {
min_temp_cells = min min_temp_cells,
max_temp_cells = max max_temp_cells,
max_temp_bms = max max_temp_bms,
count = count this,
}
)
)
checkTemps 4 (-10) 40 70
...works!
(also I changed (recent_cse numWeeks)
as that's not a function in the example)
Parentheses can be a bit confusing in some limited circumstances — explained here: https://prql-lang.org/book/reference/syntax/operators.html#parentheses. The main thing we can do is to make the error messages much much better.
would be interesting if we could have something like this! Or allow macros and duckdb functions in PRQL or something
We could have something very very simple like duckdb_macro cse 1 2 3
. It's not elegant.
How do you / others think about macros vs functions? To the extent they're overlapping, I would deprioritize work here — having a boilerplate function for each macro doesn't seem too bad. To the extent they'll be coexisting lots, we could think about improving the interface.
Would be helpful to have a better help message indeed! I thought the issue was something else
yes the boilerplate can work I agree, so not a big priority. It's just that I want to be able to do queries both in SQL and PRQL so I'd like to have macros available in both to avoid duplicating code
would it be possible to rewrite that macro in pure PRQL (eg do you have access to duckdb generate_series and list_transform in PRQL? possibly in order to have those available, having something like a ::duckdb:generate_series and ::duckdb:list_transform as tokens (or a kind of PRQL "module" that we could import like "import duckdb" and then "duckdb.generate_series", "duckdb.list_transform" and "duckdb.call_macro" could be useful?)
Maybe you want to see this. https://eitsupi.github.io/querying-with-prql/method_chaining/
Sorry to use this issue to ask questions but I find that the documentation doesn't show all the power of PRQL yet, for instance: is there a way to do a ternary operation or an if / then / else in PRQL?
I'd like to do something like
select {
result = someTemp < -10 ? "below" : "above"
}
is that possible?
You can use case https://prql-lang.org/book/reference/syntax/case.html
Thanks!
Last question (and small "bug report")
when I use this it works fine
let check_temperatures = numWeeks minCellTemp maxCellTemp maxBMSTemp -> (
from (recent_cse numWeeks)
derive {
min_temp_cells = s"LEAST(value.temperature_front, value.temperature_back)",
max_temp_cells = s"GREATEST(value.temperature_front, value.temperature_back)",
max_temp_bms = s"GREATEST(value.temperature_mosfet, value.temperature_ic)",
}
filter min_temp_cells < minCellTemp || max_temp_cells > maxCellTemp || max_temp_bms > maxBMSTemp
group bms_id (
aggregate {
min_temp_cells = min min_temp_cells,
max_temp_cells = max max_temp_cells,
max_temp_bms = max max_temp_bms,
count = count this,
}
)
)
But if I'd like to "rephrase" the min_temp, max_temp, etc at the end to only display if they are above / under their respective threshold it fails, and the error message is weird (it complains about the "->" in the function definition)
let check_temperatures = numWeeks minCellTemp maxCellTemp maxBMSTemp -> (
from (recent_cse numWeeks)
derive {
min_temp_cells = s"LEAST(value.temperature_front, value.temperature_back)",
max_temp_cells = s"GREATEST(value.temperature_front, value.temperature_back)",
max_temp_bms = s"GREATEST(value.temperature_mosfet, value.temperature_ic)",
}
filter min_temp_cells < minCellTemp || max_temp_cells > maxCellTemp || max_temp_bms > maxBMSTemp
group bms_id (
aggregate {
min_temp_cells = min min_temp_cells,
max_temp_cells = max max_temp_cells,
max_temp_bms = max max_temp_bms,
count = count this,
}
derive {
min_temp_cells = case [
min_temp_cells <= minCellTemp => min_temp_cells
true => ""
],
max_temp_cells = case [
max_temp_cells >= maxCellTemp => max_temp_cells
true => ""
],
max_temp_cells = case [
max_temp_bms >= maxBMSTemp => max_temp_bms
true => ""
],
}
)
)
(another small nitpick: why not using the same "->" for case and function, and introducing another symbol like "=>" ?)
I think you forgot the comma in arrays.
case [
foo # bad line
bar
]
ah indeed you're right, but the error message is confusing, @max-sixty would there be a way to show the "correct" error in those cases?
@max-sixty would there be a way to show the "correct" error in those cases?
It's definitely possible. I'd say it's harder than I thought it would be when I originally proposed terser syntax...
For example, in that case we need to realize we're in a list, possibly attempt to execute a function bar
with an argument foo
& realize foo
isn't a function — then add those up to say "maybe foo
should have a trailing comma, and is part of a list". And we don't have column names atm.
I would really like to focus on this, and think this will be the focus of our GSOC application.
(another small nitpick: why not using the same "->" for case and function, and introducing another symbol like "=>" ?)
FYI there was some discussion on this in a past issue... I don't think anyone had a really strong view, we ended up favoring consistency with other langs
What happened?
I have a macro defined in DuckDb to access Parquet files on s3
in SQL I can do
SELECT count(*) FROM cse(2024, 1, 5);
but if I try to compile the following PRQL
it gives an error
PRQL input
SQL output
Expected SQL output
No response
MVCE confirmation
Anything else?
No response