apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
921 stars 147 forks source link

Simplify incremental processing model #135

Open ashvina opened 1 year ago

ashvina commented 1 year ago

Another related question: could we make type of pendingCommits consistent with type of commitsToProcess, i.e. use COMMIT instead of Instant? Would it break anything for Hudi? COMMITS uniquely identify a completed or inflight transaction and if needed can provide start and end times of a commit. Instant, on the other hand is a proxy for identifying a COMMIT.

_Originally posted by @ashvina in https://github.com/onetable-io/onetable/pull/129#discussion_r1375024492_

ashvina commented 1 year ago

@the-other-tim-brown Do instants uniquely identify commits in Hudi? Looking at the code, the instants associated with inFlightCommits are persisted as part of OneTableMetadata. The on-disk representation is a comma separated string of instants. IMO, OneTable should persist commit-ids. This is not a high priority issue and can be picked up later.

the-other-tim-brown commented 1 year ago

We can look up an instant based on the instant. We wanted something that would be common between the formats.

vamshigv commented 1 year ago

@ashvina Like Tim said Onetable format representation is format agnostic and instant would best suit that. Any concerns or further thoughts ?