MrPowers / mack

Delta Lake helper methods in PySpark
https://mrpowers.github.io/mack/
MIT License
286 stars 42 forks source link

Duplicate primary keys after type_2_scd_upsert (maybe) #85

Open v01t6 opened 1 year ago

v01t6 commented 1 year ago

Hello,

We came across a bug in your code (I think). You have to call the function type_2_scd_upsert with the same data, and you get incorrect behaviour (in my opinion). It is simple to induce. An example is shown below, where INVLDT_LVL_KEYis the primary key:

Original dataframe: image

Updating dataframe: image

After update (bug?): image

Desired outcome? image

We are using mack 0.2.0.

Thanks for your feedback.