MrPowers / mack

Delta Lake helper methods in PySpark
https://mrpowers.github.io/mack/
MIT License
303 stars 44 forks source link

Test every edge case #46

Open MrPowers opened 1 year ago

MrPowers commented 1 year ago

We should write tons of tests to explore every nook and cranny of the public facing APIs. We want to always fail gracefully and give the user a really amazing description of what went wrong.

We should check None input, null columns, null input, empty DataFrames, DataFrames with weird schemas... the common causes of problems.

danielbeach commented 1 year ago

@MrPowers you still want this worked on? I can take the first shot at this.

MrPowers commented 1 year ago

@danielbeach - yep, this one is still open. Try to break stuff ;)

For example, mack.drop_duplicates_pkey(delta_table=deltaTable, primary_key="col1", duplication_columns=["col2", "col3"]) assumes that col1 is in fact a unique primary key. What if it's not unique? What's the best user experience?

I'll assign you to the issue & thanks in advance!