Open rish-shar opened 3 months ago
The python pip package does only contain the stubs for code completion. Spark requires the java package to be installed (the python package is not necessary on Databricks).
Add a Maven Library and pass uk.co.gresearch.spark:spark-extension_2.13:2.12.0-3.5
as maven package, and the extension will load as expected.
The python pip package does only contain the stubs for code completion. Spark requires the java package to be installed (the python package is not necessary on Databricks).
Add a Maven Library and pass
uk.co.gresearch.spark:spark-extension_2.13:2.12.0-3.5
as maven package, and the extension will load as expected.
@liteart How do I achieve this on Databricks? Do I need to add the package at cluster level then?
Add a Maven Library and pass
uk.co.gresearch.spark:spark-extension_2.13:2.12.0-3.5
as maven package, ...
In your setup (Scala 2.12, Spark 3.4.1), this should be uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4
.
Description
I have two PySpark dataframes, source_df and target_df. I ran
pip install pyspark-extension
to install diff.Spark Version - 3.4.1 Scala Version - 2.12
When I run
source_df.diff(target_df)
, I get the below error -Any help would be appreciated.