Open dannymeijer opened 5 months ago
The whole idea was to introduce the internal koheesio spark session to be able to provide easy switch between remote and local modes.
Also if I'm not wrong pydantic is checking SparkSession type based on the full import path and with remote spark session it is imported from a different path, at least it was in a such way sometime ago
check the affected/reference code within Koheesio
Not available on Databricks Connect for Databricks Runtime 13.3 LTS and below:
Not available:
[ ] Databricks Utilities: credentials, library, notebook workflow, widgets -- not sure what does this mean (brickflow is affected ) ==> @pariksheet with @asingamaneni
[ ] SparkContext -- No jvm operations ==> @mikita-sakalouski @dannymeijer
[ ] Changing the log4j log level through SparkContext -- need to check the code ==> Nathan
Run the unit test locally with spark-connect remote instant.
Check how to manage SparkSession and DatabricksSession.
I have added details in here related to foreachbatch function: https://github.com/Nike-Inc/koheesio/issues/56
If you prefer collecting everything here I will copy paste the comment and close the issue.
One additional point that I do not see in the list is Dataframe.rdd
which is being used in some tests
There is a way to check Spark Session is remote or native.
we should introduce api/function to get spark session flag and check against the specific APIs e.g. delta merge /snowflake and raise the exception.
-- use snowflake-connector-python instead of spark._jvm
All of these should be addressed as part of release 0.9.0 (currently in pre-release). Please verify your usecases accordingly so we can proceed with the release.
Is your feature request related to a problem? Please describe.
N/A
Describe the solution you'd like
We should add support for DBR 14.3 LTS
This means we need compatibility with:
Additionally, we need to look at how Spark Connect changes things for us. Any reference we have to JVM directly, we should investigate. Only Shared cluster mode is affected according to docs.
Describe alternatives you've considered
N/A
Additional context
N/A