Open michaelzwong opened 4 months ago
I'm currently reading from my glue-cataloged iceberg table using the following:
duckdb.sql( f""" INSTALL httpfs; LOAD httpfs; set s3_region = 'us-west-2'; set s3_access_key_id = '{settings.AWS_ACCESS_KEY_ID}'; set s3_secret_access_key = '{settings.AWS_SECRET_ACCESS_KEY}'; INSTALL iceberg; LOAD iceberg; """ ) res = duckdb.execute( "SELECT * FROM iceberg_scan('s3://foopath) LIMIT 100" )
The execution is very slow compared to just reading from the .parquet files at the same path (eg. 2 minutes vs 2 seconds).
res = duckdb.execute( "SELECT * FROM parquet_scan('s3://foopath/*.parquet) LIMIT 100" )
Would like to know what I'm doing wrong or if someone has a solution
Hi,
I would first suggest to execute the query with 'explain analyze' and post the results here. The cause might be issue #2, where more parquet files are scanned than necessary.
I'm currently reading from my glue-cataloged iceberg table using the following:
The execution is very slow compared to just reading from the .parquet files at the same path (eg. 2 minutes vs 2 seconds).
Would like to know what I'm doing wrong or if someone has a solution