Open pedrosmv opened 2 years ago
Hi @pedrosmv - thank you for reporting this. Confirming we can reproduce this based on your steps. Since an error is still being thrown and no data corruption occurs, we'll keep this open for now for anyone who might like to contribute and prioritize accordingly otherwise. Thanks!
Bug
Describe the problem
We have a merge operation using Delta + pySpark that deals with CDC data, mostly Inserts and Updates. On our testing we found out that the behaviour when dealing with null values on NOT NULL columns is very erratic. Depending on the operations, we have different results.
Steps to reproduce
Spark setup:
Create the table:
Load data into the table:
After loading the initial data, we run the merge again, trying to load data with null values:
Observed results
Running the test with the complete statement, the result is the following:
When we have only the
whenNotMatchedInsertAll()
, the result is the expected one:Expected results
Our expected result was the
InvariantViolationException
in both casesFurther details
Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?