Open machielg opened 2 years ago
yeah I'm having the same problem. I've had to abandon this library when testing dataframe equality with nested/complex datatypes.
@machielg Is this still an issue? I see that the below test shows both schemas are equal, returning true
as1 = StructType([StructField("ar", ArrayType(StringType(), False), False)])
as2 = StructType([StructField("ar", ArrayType(StringType(), True), True)])
print(are_schemas_equal_ignore_nullable(as1, as2))
True
But if the elementType of the ArrayType is a complex StructType, then the nullability differences are still considered. The below check returns False.
def test_schema_nullability_insensitive_comparisons_with_arrays():
s1 = StructType([StructField("f1", ArrayType(IntegerType(), True), True),
StructField("f2", ArrayType(
StructType([StructField("latlong", IntegerType(), False),
StructField("price", ArrayType(IntegerType(), False), False)]), True), True)])
s2 = StructType([StructField("f1", ArrayType(IntegerType(), True), True),
StructField("f2", ArrayType(
StructType([StructField("latlong", IntegerType(), True),
StructField("price", ArrayType(IntegerType(), True), True)]), True), True)])
print(are_schemas_equal_ignore_nullable(s1, s2))
@MrPowers Please check if the string equality check can be used in this case as in https://github.com/orcascope/chispa/commit/ea5d61cf01c44bc6a9f7436bc4c54ae6d622dcd2
This issue seem to be resolved already. I tested iwith are_schemas_equal_ignore_nullable
and assert_schema_equality
When using
ignore_nullable=True
chispa still sees differences in ArrayType because there's a nullable difference in the inner type:StructField(my_arr_col,ArrayType(StringType,false),false)
StructField(my_arr_col,ArrayType(StringType,true),true)