databrickslabs / remorph

Cross-compiler and Data Reconciler into Databricks Lakehouse
Other
37 stars 23 forks source link

[TECH DEBT]: Data Compare Consolidation #745

Open vijaypavann-db opened 2 months ago

vijaypavann-db commented 2 months ago

Is there an existing issue for this?

Category of feature request

Reconcile

Problem statement

In the current codebase, the reconcile_data and _get_mismatch_data functions are tightly integrated with HASH_COLUMN_NAME to identify the mismatch, missing_in_src and missing_in_tgt records; due to this, it can't be reused for any other use cases like Aggregate Data Reconciliation .

Proposed Solution

Need to update these as generic common functions, so that it can be reused for other use cases like Aggregate Data Reconciliation along with Hash Data Reconciliation.

Additional Context

For Aggregate Data Reconciliation, two new functions, join_aggregate_data and _get_mismatch_agg_data, are introduced. Once generic functions are developed, these two functions should be removed and integrated with the common functions.

vijaypavann-db commented 2 months ago

Mentioned here: https://github.com/databrickslabs/remorph/pull/740#discussion_r1696806913