ThoughtWorks-SEA / recce

Server-based database reconciliation tool for developers
Apache License 2.0
23 stars 3 forks source link

Allow deletion of older runs' records for scheduled reconciliation tasks #157

Open jiawen-tw opened 2 years ago

jiawen-tw commented 2 years ago

Context / Goal

For each reconciliation run, it will generate many reconciliation run records inside the database. Specifically, the reconciliation_record table will have as many rows as there are migration keys in the dataset.

With each new reconciliation run, the older runs' results also becomes less meaningful and are less likely to be accessed by user.

For regularly scheduled runs, this would accumulate a large amount of data laying around in the database which can incur significant fees overtime.

Expected Outcome

Out of Scope

Additional context / implementation notes

aditi-agarwal-tw commented 2 years ago

Few questions:

  1. Will X be a timestamp field?
  2. Will the older runs be cleaned up only if the current run is a success? My worry is if I have multiple consecutive failed runs, and if X is not chosen wisely, I could end up with no successful runs history in my db.
  3. The schedule config is right now at the dataset level, so the X should also be applied at dataset level? what does regardless of dataset mean?
  4. Given schedule is an optional config, if it is applied after some manual runs, will the cleanup also remove the manual runs before X given there is no way to differentiate manual and scheduled runs?
  5. Cleanup job should match X with the completedTime to determine which runs to remove?
aditi-agarwal-tw commented 2 years ago

Tasks to be done: