Open tatiana opened 9 months ago
I've contacted @stikkireddy to support him in getting https://github.com/apache/airflow/pull/32221 merged. We're making significant progress, and I expect it to be merged soon.
Given that the Databricks SDK interfaces are still changing (it didn't have a stable 1.0 release yet), we agreed not to have that as a dependency of the Airflow Databricks provider itself until it becomes stable.
On our migration task, I've been testing the DatabricksCreateJobsOperator
, and I'm making changes to have DatabricksWorkflowTaskGroup
working with this operator. I'm making the changes based on @stikkireddy 's branch.
While discusssing contributing this work into the Apache Airflow repo with @alexott, he gave the following feedback:
We need to talk about integrating your work with JobsCreate operator, which is now developed by @Sri Tikkireddy (PR: https://github.com/apache/airflow/pull/32221).
From analysis of your code, it has a lot of overlap with your work, but has some valuable things, like the use of Data Classes from the Databricks Python SDK.
As you mentioned, you're using SDK from Databricks CLI - it's already considered deprecated and is replaced by Databricks Python SDK. It has a big advantage over the old SDK as it evolves together with the REST APIs.
If your code doesn't provide asynchronous execution, then either use of SDK could be the best way forward. Or we can switch to using DatabricksHook functions.
In your code, instead of JSON payload for tasks, and having dedicated operator for notebooks, we can switch to use data classes from the new SDK - it will give self-documenting capabilities and type safety.