databrickslabs / overwatch

Capture deep metrics on one or all assets within a Databricks workspace
Other
226 stars 64 forks source link

Standardize Write Mode to Merge for All Tables to Ensure Data Integrity and Avoid Duplication #1240

Open SouravSaxena3200 opened 3 months ago

SouravSaxena3200 commented 3 months ago

I have noticed that in the consumer database, some tables are written using the append mode, while others use the merge mode. To reduce data duplication and improve data integrity, can we standardize the write mode to merge for all tables? Alternatively, introducing a configuration flag during the run to specify whether all tables should use the default mode or be forced into merge mode would provide greater flexibility and control over the data management process.

gueniai commented 3 months ago

Hi @SouravSaxena3200! Thank you for the suggestion! The tables have been deliberately designed the way they are. Utilizing the correct write mode allows us to create pipelines that are as performant as possible. These should not lead to any duplication or data integrity issues, have you noticed anything of the sort?