Closed kformanowicz-dotdata closed 3 months ago
Script to reproduce:
from pyspark.sql import SparkSession import datacompy spark = SparkSession.builder.getOrCreate() df1 = spark.createDataFrame( [ (1, "foo"), (2, "bar"), ], ["id", "例"] ) df2 = spark.createDataFrame( [ (1, "foo"), (2, "baz"), ], ["id", "例"] ) comp = datacompy.SparkCompare(spark, df1, df2, join_columns=["例"]) comp.report()
It seems that unicode chars are not escaped correctly when building SQL query for compare.
Thanks for reporting. Will take a look into this shortly
@kformanowicz-dotdata I have a fix for our new pyspark implementation here.
I'll work on getting a legacy fix also. Just an FYI the legacy spark will eventually be deprecated in favour to align on the above.
Script to reproduce:
It seems that unicode chars are not escaped correctly when building SQL query for compare.