MeltanoLabs / target-postgres

MIT License
11 stars 20 forks source link

fix: Ensure hard-delete mode doesn't delete data it shouldn't when `ACTIVATE_VERSION` messages are processed #391

Closed sebastianswms closed 3 months ago

sebastianswms commented 4 months ago

What hard delete was previously doing wrong

As discussed in https://github.com/MeltanoLabs/target-postgres/issues/340, hard delete deleted data that it shouldn't. This was caused by a <= where there should have been a <.

So why were the tests passing?

All ACTIVATE_VERSION messages were run before any records were synced. This is because RECORD messages were being batched until the end of the sync, whereas ACTIVATE_VERSION messages were processed as they arrived.

The intent of test_activate_version_hard_delete was to:

  1. Sync seven records
  2. Check that seven records were synced
  3. Add two records "manually"
  4. Check that there are nine records in total
  5. Sync the same seven records, with ACTIVATE_VERSION removing the two manual records.
  6. Check that seven records remain

Here's what was really happening:

  1. Sync seven records
  2. Seven records were synced? ✅
  3. Add two records "manually"
  4. There are nine records in total? ✅
  5. ACTIVATE_VERSION deletes all nine records and then syncs the same seven records again.
  6. Seven records remain? ⚠️ Yes, but not for the correct reason

My fix

I don't know if I fully understand the ACTIVATE_VERSION specification, so please correct me, but here's my change.

Drain the sink immediately before an ACTIVATE_VERSION message is processed for a sink. This means:

  1. The target still batches records when possible.
  2. ACTIVATE_VERSION messages are still processed as they arrive.
  3. Importantly, ACTIVATE_VERSION messages correctly apply to all preceding records.

Closes #340

edgarrmondragon commented 4 months ago

I plan to review this on the week of the 29th

edgarrmondragon commented 3 months ago

The intent of test_activate_version_hard_delete was to:

1. Sync seven records

2. Check that seven records were synced

3. Add two records "manually"

4. Check that there are nine records in total

5. Sync the same seven records, with ACTIVATE_VERSION removing the two manual records.

6. Check that seven records remain

Here's what was really happening:

1. Sync seven records

2. Seven records were synced? ✅

3. Add two records "manually"

4. There are nine records in total? ✅

5. ACTIVATE_VERSION deletes all nine records and then syncs the same seven records again.

6. Seven records remain? ⚠️ Yes, but not for the correct reason

This is a great analysis of what's happening under the hood 🙏