MrPowers / chispa

PySpark test helper methods with beautiful error messages
https://mrpowers.github.io/chispa/
MIT License
595 stars 65 forks source link

Add support to ignore field metadata when comparing schemas/dataframes #64

Closed khaledh closed 11 months ago

khaledh commented 1 year ago

Spark supports field metadata, which is a dictionary of key-value pairs you can associate with any field. We have a use case where we'd like to compare two schemas (or dataframes) where they differ only in their field metadata. This currently fails because the schema comparer doesn't account for this. This PR adds a flag ignore_metadata to the relevant functions (particularly assert_df_equality and assert_schema_equality) that allows us to set this flag to true to ignore differences in field metadata.