capitalone / datacompy

Pandas, Polars, and Spark DataFrame comparison for humans and more!
https://capitalone.github.io/datacompy/
Apache License 2.0
420 stars 124 forks source link

Datatype standardization before comparing for dataframes from DASK or Pyspark #273

Closed xs005 closed 2 months ago

xs005 commented 3 months ago

In most cases, datacompy works very well for the comparison at the cell level. But I have met a few cases when I converted the datafrom from DASK or Pyspark dataframes, I need to apply the data type schema so that to get correct comparison result, especially the situation that you have long float64, and compare with another float32. The default data type from DASK or Pyspark dataframe may not correct. I would provide an example so that you can reproduce the issue.

fdosani commented 3 months ago

I think a minimal example to show what issue you are facing would be very helpful for the team to understand the depth of the problem. So yes please.

fdosani commented 2 months ago

@xs005 Just wanted to follow up if you had a minimal example to understand the issue you are facing?

fdosani commented 2 months ago

Issue is stale. Closing. Please reopen if you have any updates.