Closed mmaitre314 closed 2 months ago
Does this help?
https://prql-lang.org/book/reference/data/read-files.html
In particular:
prql target:sql.duckdb
from `az://azduckdb.blob.core.windows.net/data/HuggingFaceFW/fineweb/CC-MAIN-2013-20/*.parquet`
take 5
SELECT
*
FROM
"az://azduckdb.blob.core.windows.net/data/HuggingFaceFW/fineweb/CC-MAIN-2013-20/*.parquet"
LIMIT
5
-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)
...and the read_parquet
function
(FYI the python releases are a bit out of date, it's not impossible they haven't updated to this. We're waiting on PyPI but could publish a differently named package in the meantime if needed)
Setting the target dialect fixed the issue. Thanks for the quick response!
I was able to use PRQL to read Parquet data through DuckDB but this required using an s string. It would be nice if there was native support for DuckDB's data-import syntax.
What I ended up writing:
What I would have liked to write:
Notebook for reference: https://github.com/mmaitre314/az-duckdb/blob/main/test_pyprql.ipynb