TonicAI / condenser

Condenser is a database subsetting tool
https://www.tonic.ai
MIT License
312 stars 48 forks source link

Measuring Loss of Data when cutting an edge to remove a cycle #23

Open raresboza opened 3 years ago

raresboza commented 3 years ago

Greetings, I was reading the following article on subsetting: https://www.tonic.ai/blog/condenser-a-database-subsetting-tool

I don't exactly understand what the faults are at dropping a cycle from a database. Of course, one loses data when doing so, but is the same amount of data lost irrespective of where you cut the cycle? How could one measure that? What are some of the criteria that affect it?

theaeolianmachine commented 2 years ago

Hi @raresboza, I'm not sure I understand your question — condenser is setup to handle dependency breaks wherever it best makes sense, but realistically all it does is shoves NULLs in the column in question. Otherwise a cycle would ultimately cause all of the data to be grabbed within tables in the cycle in some cases, and we'd certainly not be able to peform a true topological sort.

Ultimately it comes down to whatever column you find less valuable in order to determine where to make the break.