Open sebastiandaberdaku opened 5 months ago
Hey @sebastiandaberdaku thanks for opening this! I just created a parent issue https://github.com/delta-io/delta/issues/3240 which addresses this for both the Scala and Python DeltaTable
APIs.
This is definitely on our roadmap for future releases (and the Delta 4.0.0 Preview release will include some partial support!)
Feature request
Which Delta project/connector is this regarding?
Spark Connect support for the Python API
Overview
Currently, the Python Delta Table API does not fully support Spark Connect. This is because when using Spark Connect, lower level APIs such as the Spark Context are not available. The Spark Context object is however used in the
DeltaTable.forPath
andDeltaTable.forName
methods, which means that these two methods cannot be used with Spark Connect.Motivation
This feature will benefit all PySpark users who want to use the DeltaTable Python API in their code when using Spark Connect.
Further details
The Delta Table SQL API is supported by Spark Connect. For the moment I have been converting PySpark DataFrames to Temporary Views, and then used Spark SQL to do merges.
For others who might need this in the mean time, The following code provides a function to convert a PySpark DataFrame into a TempView.
Now, say
to_upsert
is the PySpark DataFrame you want to upsert into the DeltaTable located atpath
. You can use the provided function like so:Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?