erezsh / reladiff

High-performance diffing of large datasets across databases
https://reladiff.readthedocs.io/en/latest/index.html#
Other
366 stars 9 forks source link

Add support for Snowflake `at timestamp` Time Travel in TableSegment #42

Open alex-mirkin opened 2 months ago

alex-mirkin commented 2 months ago

Is your feature request related to a problem? Please describe.

When comparing tables across databases that are continuously updated, we need the ability to “freeze” tables at a specific point in time to ensure accurate comparisons. In some cases, we can use the where, min_update, and max_update parameters to achieve a similar result, but there are cases when this is not possible - for example, when there is no updated_at column that tracks the last update date in the table.

Describe the solution you'd like

Snowflake offers a Time Travel feature that allows viewing a table as it existed at a specific timestamp. Example:

SELECT *
FROM my_table AT (TIMESTAMP => '2024-09-10 20:52:25'::timestamp);

By combining this with data-freezing methods from other databases (such as the cluster clone feature in AWS Aurora), we can ensure accurate data comparison.

Describe alternatives you've considered

A user-side alternative for Snowflake is to create a view for each table using the Time Travel syntax before running reladeff, then use these views for comparison instead of the original tables.

Additional context The implementation could include a parameter similar to where that restricts the search space. For example, a snowflake_at_timestamp parameter could be introduced, which would accept a timestamp to specify the point in time for the Snowflake Time Travel.

erezsh commented 2 months ago

I think this is a feature that would be relatively easy to add. I think it would only require changes to table_segment.py

Specifically, the table segment class should be modified to also accept an instance of sqeleton.queries.ast_classes.TimeTravel, in place of a TablePath.

Hopefully it won't require any modifications to sqeleton, although perhaps small adjustments would be required.

If you would like to submit a PR for this, I would be happy to guide you in the process. Once we have support for it in the Python API, we can consider if it's worth adding to the CLI too.

alex-mirkin commented 2 months ago

Great, I’ll work on figuring out the necessary changes and submit the initial work. We can then proceed from there.