PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.85k stars 216 forks source link

Better support for DuckDB data-import syntax #4726

Closed mmaitre314 closed 2 months ago

mmaitre314 commented 2 months ago

I was able to use PRQL to read Parquet data through DuckDB but this required using an s string. It would be nice if there was native support for DuckDB's data-import syntax.

What I ended up writing:

from s"SELECT * FROM 'az://azduckdb.blob.core.windows.net/data/HuggingFaceFW/fineweb/CC-MAIN-2013-20/*.parquet'"
take 5

What I would have liked to write:

from 'az://azduckdb.blob.core.windows.net/data/HuggingFaceFW/fineweb/CC-MAIN-2013-20/*.parquet'
take 5

Notebook for reference: https://github.com/mmaitre314/az-duckdb/blob/main/test_pyprql.ipynb

max-sixty commented 2 months ago

Does this help?

https://prql-lang.org/book/reference/data/read-files.html

In particular:

prql target:sql.duckdb

from `az://azduckdb.blob.core.windows.net/data/HuggingFaceFW/fineweb/CC-MAIN-2013-20/*.parquet`
take 5
SELECT
  *
FROM
  "az://azduckdb.blob.core.windows.net/data/HuggingFaceFW/fineweb/CC-MAIN-2013-20/*.parquet"
LIMIT
  5

-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)

...and the read_parquet function

(FYI the python releases are a bit out of date, it's not impossible they haven't updated to this. We're waiting on PyPI but could publish a differently named package in the meantime if needed)

mmaitre314 commented 2 months ago

Setting the target dialect fixed the issue. Thanks for the quick response!