MrPowers / chispa

PySpark test helper methods with beautiful error messages
https://mrpowers.github.io/chispa/
MIT License
586 stars 65 forks source link

Handle nested nullability #44

Open machielg opened 2 years ago

machielg commented 2 years ago

When using ignore_nullable=True chispa still sees differences in ArrayType because there's a nullable difference in the inner type:

StructField(my_arr_col,ArrayType(StringType,false),false) StructField(my_arr_col,ArrayType(StringType,true),true)

etlundquist commented 2 years ago

yeah I'm having the same problem. I've had to abandon this library when testing dataframe equality with nested/complex datatypes.

orcascope commented 1 year ago

@machielg Is this still an issue? I see that the below test shows both schemas are equal, returning true

as1 = StructType([StructField("ar", ArrayType(StringType(), False), False)])
as2 = StructType([StructField("ar", ArrayType(StringType(), True), True)])
print(are_schemas_equal_ignore_nullable(as1, as2))

True

orcascope commented 1 year ago

But if the elementType of the ArrayType is a complex StructType, then the nullability differences are still considered. The below check returns False.

def test_schema_nullability_insensitive_comparisons_with_arrays():
    s1 = StructType([StructField("f1", ArrayType(IntegerType(), True), True),
                     StructField("f2", ArrayType(
                         StructType([StructField("latlong", IntegerType(), False),
                         StructField("price", ArrayType(IntegerType(), False), False)]), True), True)])

    s2 = StructType([StructField("f1", ArrayType(IntegerType(), True), True),
                     StructField("f2", ArrayType(
                         StructType([StructField("latlong", IntegerType(), True),
                         StructField("price", ArrayType(IntegerType(), True), True)]), True), True)])

    print(are_schemas_equal_ignore_nullable(s1, s2))

@MrPowers Please check if the string equality check can be used in this case as in https://github.com/orcascope/chispa/commit/ea5d61cf01c44bc6a9f7436bc4c54ae6d622dcd2

zeotuan commented 5 months ago

This issue seem to be resolved already. I tested iwith are_schemas_equal_ignore_nullable and assert_schema_equality