apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
865 stars 143 forks source link

Support for deletion vector translation #339

Open ashvina opened 7 months ago

ashvina commented 7 months ago

Deletion vectors is an optimization feature that can be enabled on Delta Lake tables and Iceberg tables. They allow DELETE and UPDATE operations to mark existing rows as removed or changed without rewriting the Parquet file. Hudi may soon support a similar representation for deletion vectors.

Currently, XTable does not support handling and translating the deletion files between formats. This means that XTable cannot preserve the deletion vectors when converting a table from one format to another, resulting in incomplete translation and/or incorrect results. This feature request is to add support for deletion vector translation in XTable.

The proposed steps to implement the first phase of this feature are:

shabeebrp commented 3 months ago

@ashvina Lack of deletion vector support is a major limitation in XTable as it can't support MOR upsert tables. Adding support for deletion vector / delete files will be extremely useful. Are you working on this currently and are you looking out for some collaboration from community ?

Reactor11 commented 1 week ago

Hi @ashvina - I am new to the project, I want start contributing to this feature. At our org, We are trying to implement our own code but want to use Xtable.