duckdb / duckdb_delta

DuckDB extension for Delta Lake
MIT License
88 stars 8 forks source link

Bug: deletion vectors do not work on AWS S3 #17

Closed marsupialtail closed 1 month ago

marsupialtail commented 1 month ago

Hi deletion vectors do not work with delta tables on S3 due to the way you are processing the deletion vector paths.

Local, you would do something like: select from delta_scan('file:///data/delta-table/'); deletion vector paths get properly formatted for this case. If you do something like: select from delta_scan('file:///data/delta-table'); (without the / at the end), the deletion vector paths will be improperly formatted and you cannot read it. Cool everything works I can just add slash at end.

On S3, if you try to do select from delta_scan('s3://delta-bucket/delta/'); you are gonna have issues because the parquet file paths will be messed up because of the extra slash at the end. So you have to do select from delta_scan('s3://delta-bucket/delta'); But then this messes up the deletion vector paths.

What do you think? Is this easy fix? I can contribute -- really need this feature to be working.