Closed crong-k closed 11 months ago
Hey @crong-k, this looks great! Going to spend some time reviewing and testing it next week, but this is going to be awesome to get in.
Hello @jlaneve , the CI is encountering a 'ModuleNotFoundError: No module named 'airflow'' error. Is there something I need to fix? If so, please let me know.
Hi @crong-k, I was testing out this PR and to do that, I installed the astro-provider-databricks from this repo while building a docker image for Airflow.
I know that astro-provider-databricks was installed successfully as I was able to import packages like astro_databricks.operators.notebook import DatabricksNotebookOperator
but I get a module not found exception when I import from astro_databricks import DatabricksTaskOperator
or from astro_databricks.operators.common import DatabricksTaskOperator
.
If you follow the same method to test this PR, what do you think I am doing wrong so that I am not able to import DatabricksTaskOperator
? Thanks.
Hello @singhsatnam , Please check the version with the following code:
import astro_databricks
print(astro_databricks.__version__)
Does it print out as 0.1.5?
@crong-k do you care if I push directly to your branch? going to make a few minor modifications as I'm reviewing!
This PR adds a new operator to support all task types in Databricks.
Previously, the only operator available in this package was
DatabricksNotebookOperator
.The newly added operator enables the creation of various tasks within the Databricks workflow. This is accomplished by freely assigning a
task_config
.Due to ongoing changes in the Databricks API specifications, the
task_config
has been designed to permit flexible parameter input, as opposed to strictly enforcing and specifying the parameters required for the task.It has been confirmed that both the notebook task and the spark jar task are successfully created within a single Databricks workflow when the example code in the docstring is executed.
Should this PR be merged, there would be a need to modify the
DatabricksNotebookOperator
to override theDatabricksTaskOperator
for the code reusability