Closed pangjac closed 3 months ago
Hey @pangjac first off thank you for supporting the package!
sample_mismatch
doesn't exist for the SparkCompare
class in that version of datacompy
. We have a branch which is waiting review where we are shifting to pandas on pyspark if you are ok using that instead. v0.8.4
is fairly old so I'd highly recommend bumping up if you are able to. That old version of SparkCompare doesn't inherit from the base class as it was built aside from it. It has been something which has been bugging me hence the new branch waiting review and deprecating the old Spark class.
If you look at the new implementation (which aligns better to the pandas, polars, and fugue logic) we will have that function natively for Spark.
Alternatively I wonder if the internal dataframe: _all_rows_mismatched
would give you what you need. you can filter on the column you are interested in since its just a Spark DF.
@pangjac Just wanted to follow up and see if this was solved for you? Thanks!
Hi,
I am currently using 0.8.4. For a certain column, I am trying to print a sample_mismatch to check what is the value different for this column between two pyspark dataframe : It seems
SparkCompare
object has no attribute 'sample_mismatch` ?Wondering if this is the version issue or not. However, the latest documentation does not list
sample_mismatch
in datacompy.spark module as well.If confirmed, could you provide a quick poke on the reason why this method is not inherited. If this is no specific blockers, I'd happy to contribute to dev this method under spark module.
Thanks for this wonderful package!