dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.65k stars 176 forks source link

Feat/1331 disables deduplication for incremental #1892

Open willi-mueller opened 1 month ago

willi-mueller commented 1 month ago

Description

This PR disables deduplication for the test case described by ingestr here: https://github.com/dlt-hub/dlt/issues/971#issuecomment-1983417044

Related Issues

Questions

@rudolfix

  1. I could not understand your points 2, 3, and 4 in the issue #1131 . Are they already implemented in this PR?
  2. I am not sure if the last test tests/extract/test_incremental.py::test_deduplication_on_write_disposition_not_merge makes sense at all because on write_disposition="replace" the table is truncated before the load. I included it because the ticket speaks of the merge write disposition so I wanted to test the opposite too. Feel free to drop this commit.
  3. Are the assertions on the incremental's last_value superfluous?
  4. I could not find a way to test that the incremental returns all values without deduplication. Thus, I implemented the tests by making assertions on the loaded data. Is this strategy fine?

TODO after merge

netlify[bot] commented 1 month ago

Deploy Preview for dlt-hub-docs canceled.

Name Link
Latest commit 4c3bd87789359fd5b38a7eead35f93ed6db4fc15
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6730c0363e24740008651ae8