dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.41k stars 159 forks source link

Wrong Merge Key Not Throwing Error #1463

Closed zem360 closed 1 month ago

zem360 commented 3 months ago

dlt version

0.4.12

Describe the problem

While working on a community support request regarding snake_case, camel_case @dat-a-man discovered this bug.

A typo in the merge_key or the wrong merge_key doesn't throw an error.

Expected behavior

If there is a typo in the merge_key or the provided key is not in the data an error should be thrown, but the code is running normally.

In the code snippet provided in Steps to Reproduce the merge_key = 'mana' should throw an error as it is not present in the data, but that is not the case.

Steps to reproduce

Run the following code:

@dlt.resource(name='table_name', merge_key = "mana", write_disposition={"disposition": "merge"})
def func():
    data = [{'id':1, 'NAme':'abcaaaa', 'status':'bronze'},
            {'id':2,'NAme':'deaf','status':'bronze'}]
    yield data

Operating system

macOS

Runtime environment

Local

Python version

3.10

dlt data source

No response

dlt destination

No response

Other deployment details

No response

Additional information

No response

rudolfix commented 3 months ago

the reason is that we use merge_key only during the loading. so before that no checks are done. same thing will happen to any other hint including primary_key (if not part of incremental which is actually checking the data).

to really fix this issue (before loading starts) we'd need to track which columns received data, currently we track only table level. maybe we can do that in a separate ticket.

what we can do now: