duckdb / pg_duckdb

DuckDB-powered Postgres for high performance apps & analytics.
MIT License
1.62k stars 57 forks source link

Using container and configure a local directory as volume to read with duckdb #363

Closed lgonzalezsa closed 3 weeks ago

lgonzalezsa commented 3 weeks ago

What happens?

Not sure what I am doing wrong, trying to access local filesystem where I have a set of parquet files. I defined the volume in the container running PostgreSQL:

postgres=# select * from read_parquet('/mnt/data/*.parquet') as data
limit 10;
ERROR:  a column definition list is required for functions returning "record"
LINE 1: select * from read_parquet('/mnt/data/*.parquet')...
                      ^
Time: 0.227 ms

Not sure if relates to issue https://github.com/duckdb/pg_duckdb/issues/105

To Reproduce

Copy a parquet file and make it available to the container in /mnt/data, then run the query Container created using example from REAME docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:16-main

select * from read_parquet('/mnt/data/*.parquet') as data
limit 10;

OS:

openSUSE Leap 15.5

pg_duckdb Version:

0.1.0

Postgres Version:

16

Hardware:

No response

Full Name:

Luis Gonzalez Sandoval

Affiliation:

HPE

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

wuputah commented 3 weeks ago

You need to specify a column list for read_parquet. See https://github.com/duckdb/pg_duckdb/blob/main/docs/functions.md#read_parquet

lgonzalezsa commented 3 weeks ago

Thank you @wuputah after passing the column list able to query the parquet data. Awesome!

anentropic commented 10 hours ago

I get the same error from iceberg_scan

https://github.com/duckdb/pg_duckdb/blob/main/docs/functions.md#iceberg_scanpath-text--optional-parameters----setof-record

the example in the docs...

SELECT COUNT(i) FROM iceberg_scan('data/iceberg/table') AS (int i);

...doesn't actually work, it gives error:

mydb=# select count(i) from iceberg_scan('/iceberg/catalog/table/metadata/00693-b272df17-984f-4028-ade1-cb5bef59ef64.metadata.json') as (int i);
ERROR:  type "i" does not exist
LINE 1: ...df17-984f-4028-ade1-cb5bef59ef64.metadata.json') as (int i);
                                                                    ^

I think syntax is wrong, it should be (i int)

(looks like same problem in all the doc examples)