duckdb / duckdb-wasm

WebAssembly version of DuckDB
https://shell.duckdb.org
MIT License
1.15k stars 121 forks source link

read_parquet does not work with s3 links containing wildcard #1833

Open prusswan opened 2 weeks ago

prusswan commented 2 weeks ago

What happens?

This works:

select count(*) from read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=places/type=place/part-00001-93118862-ebe9-4b31-8277-1a87d792bd5d-c000.zstd.parquet');

This does not work:

select count(*) from read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=places/type=place/part*.zstd.parquet');

image

HEAD, GET requests with 404 errors can also be seen in the console:

image

To Reproduce

LOAD httpfs;
LOAD spatial;

select count(*) from read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=places/type=place/part*.zstd.parquet');

https://shell.duckdb.org/#queries=v0,LOAD-httpfs~%0ALOAD-spatial~%0A%0Aselect-count(*)-from-read_parquet('s3%3A%2F%2Foverturemaps%20us%20west%202%2Frelease%2F2024%2008%2020.0%2Ftheme%3Dplaces%2Ftype%3Dplace%2Fpart*.zstd.parquet')~

Other example queries can be found here: https://docs.overturemaps.org/getting-data/duckdb/#example-queries

Browser/Environment:

Firefox 127.0

Device:

laptop

DuckDB-Wasm Version:

1.28.1-dev258.0

DuckDB-Wasm Deployment:

shell.duckdb.org

Full Name:

PW

Affiliation:

none

prusswan commented 2 weeks ago

Feel free to close if this is duplicate of #1040