Using container and configure a local directory as volume to read with duckdb

lgonzalezsa commented 3 weeks ago

What happens?

Not sure what I am doing wrong, trying to access local filesystem where I have a set of parquet files. I defined the volume in the container running PostgreSQL:

postgres=# select * from read_parquet('/mnt/data/*.parquet') as data
limit 10;
ERROR:  a column definition list is required for functions returning "record"
LINE 1: select * from read_parquet('/mnt/data/*.parquet')...
                      ^
Time: 0.227 ms

Not sure if relates to issue https://github.com/duckdb/pg_duckdb/issues/105

To Reproduce

Copy a parquet file and make it available to the container in /mnt/data, then run the query Container created using example from REAME docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:16-main

select * from read_parquet('/mnt/data/*.parquet') as data
limit 10;

OS:

openSUSE Leap 15.5

pg_duckdb Version:

0.1.0

Postgres Version:

16

Hardware:

No response

Full Name:

Luis Gonzalez Sandoval

Affiliation:

HPE

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

[X] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

[X] Yes, I have

wuputah commented 3 weeks ago

You need to specify a column list for read_parquet. See https://github.com/duckdb/pg_duckdb/blob/main/docs/functions.md#read_parquet

lgonzalezsa commented 3 weeks ago

Thank you @wuputah after passing the column list able to query the parquet data. Awesome!

anentropic commented 10 hours ago

I get the same error from iceberg_scan

https://github.com/duckdb/pg_duckdb/blob/main/docs/functions.md#iceberg_scanpath-text--optional-parameters----setof-record

the example in the docs...

SELECT COUNT(i) FROM iceberg_scan('data/iceberg/table') AS (int i);

...doesn't actually work, it gives error:

mydb=# select count(i) from iceberg_scan('/iceberg/catalog/table/metadata/00693-b272df17-984f-4028-ade1-cb5bef59ef64.metadata.json') as (int i);
ERROR:  type "i" does not exist
LINE 1: ...df17-984f-4028-ade1-cb5bef59ef64.metadata.json') as (int i);
                                                                    ^

I think syntax is wrong, it should be (i int)

(looks like same problem in all the doc examples)

duckdb / pg_duckdb