Closed VinothKanna007 closed 7 months ago
Now i have switched to different version of jar
Now I'm getting an Error
in diff_with_options
Method. Attached the Full Stacktrace below
Spark Version: 3.1.1-amzn-0
Python Version - 3.6.10 | Anaconda, Inc.
Jar: uk.co.gresearch.spark:spark-extension_2.13:2.7.0-3.4"
The invalid syntax
error reported in the description is due to using unsupported Python 3.6. Please use Python 3.7 or above.
The NoClassDefFoundError
is due to using the Scala 2.13 version with PySpark, which uses Scala 2.12. Please use spark-extension_2.12
instead.
Thanks! it works.
One more question: Is there any option to ignore
the match value while displaying. Since im not bothered about Matching records
Basically i want only the records >
epsilon value.
Reason
: I checked the query plan(lot of case statements). And it takes more time when i'm dealing with large dataframes. I want to find only
my mismatch records with a minimal time
Sure, use the sparse mode: https://github.com/G-Research/spark-extension/blob/master/DIFF.md#sparse-mode
Cool. Thanks @EnricoMi
Btw this is a great package. Loved it♥️
Error:
FYI:
Spark Version:
3.1.1-amzn-0
Python Version -
3.6.10 | Anaconda, Inc.
Jar: uk.co.gresearch.spark:spark-extension_
2.12:2.11.0-3.5"