When using the underline_cells flag in assert_df_equality if the dataframes have different amounts of rows, the assertion function throws an exception.
from decimal import Decimal
from pyspark.sql import SparkSession
from pyspark.sql import types as T
from chispa.dataframe_comparer import assert_df_equality
spark = SparkSession.builder.getOrCreate()
schema = T.StructType(
[
T.StructField("id", T.StringType(), nullable=False),
T.StructField("balance", T.DecimalType(38,6), nullable=True),
]
)
df1 = spark.createDataFrame(
[
[1, None],
[2, Decimal(1.0)],
],
schema=schema,
)
df2 = spark.createDataFrame(
[
[1, None],
[2, Decimal(1.0)],
[3, Decimal(100)],
],
schema=schema,
)
This gives two dataframes, with different row counts:
When using the
underline_cells
flag inassert_df_equality
if the dataframes have different amounts of rows, the assertion function throws an exception.This gives two dataframes, with different row counts:
When calling just
assert_df_equality
you get the expected comparison:but when adding
underline_cells
you get an exception: