Open shri-0509 opened 3 months ago
Looks like Databricks does not like how we extend DataFrame
with .diff
.
You can diff as follows:
diff(left, right)
Maybe spark.createDataFrame
does not reurn a pyspark.sql.dataframe.DataFrame
but some Databricks DataFrame
.
Could you please execute the following on your side and share the output?
print(type(left))
yes it will result in different dataframe <class 'pyspark.sql.connect.dataframe.DataFrame'>.
I managed to reproduce the issue with a local Spark Connect server. Looks like the diffing does not work with Spark Connect. Will investigate a fix.
sure thanq. right now i am doing left_anti join to get added, deleted and inner join to get modified and unchanged. Thinking to use this library to do the same
here is the code
have added maven library: uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5