Open OussamaSaoudi-db opened 1 week ago
A couple drive-by comments just from the PR description:
TableChanges
is constructed from aTable
, and performs 2 protocol and metadata scans.
As part of this work, it may be time to officially support incremental snapshot construction, incremental P&M in particular. Java kernel already has this, because a big customer was hitting bad performance in their streaming workloads. @scottsand-db has more context there.
Basically, three ideas:
We also introduce
LogSegment::replay_commits
, which returns an Iterator over each commit. For each commit, there is an iterator for all the actions in the commit. This will be useful for iterating over log files when performing the scan.
We may want a different name? We normally use the term "[log] replay" as a synonym for action reconciliation. AFAIK, this proposed method is just giving back all the raw actions between two versions?
What changes are proposed in this pull request?
This PR introduces the
TableChanges
struct which represents a Change Data Feed scan.TableChanges
is constructed from aTable
, and performs 2 protocol and metadata scans. One is for the start version, and ensures that CDF is enabled at the beginning version. The second protocol and metadata scan is for the end version. This one is used to extract the schema at the end version and ensure that the final version has CDF enabled.We also introduce
LogSegment::replay_commits
, which returns an Iterator over each commit. For each commit, there is an iterator for all the actions in the commit. This will be useful for iterating over log files when performing the scan.Depends on: #457
How was this change tested?