databricks / databricks-sdk-py

Databricks SDK for Python (Beta)
https://databricks-sdk-py.readthedocs.io/
Apache License 2.0
321 stars 105 forks source link

[ISSUE] databricks sdk jobs. how to create dependency task /lineage using python #504

Open shivatharun opened 6 months ago

shivatharun commented 6 months ago

How to create dependency jobs / lineage using databricks sdk. I found documentation for single job creation.

created_job = w.jobs.create(name=f'sdk-{time.time_ns()}',
                            tasks=[
                                jobs.Task(description="test",
                                          existing_cluster_id=cluster_id,
                                          notebook_task=jobs.NotebookTask(notebook_path="test_run"),
                                          task_key="test",
                                          timeout_seconds=0)

Lets say I have main notebook within the notebook creating a job test and passing "test_run" notebook to trigger. I want to run test_run notebook with different paremeter. How to create lineage using sdk python. ? Could please help to share any references I couldn't find ?

tanmay-db commented 6 months ago

Hi @shivatharun, the lineage isn't supported in the SDK currently, however you could update the job with different parameters for example: https://github.com/databricks/databricks-sdk-py/blob/main/examples/jobs/update_jobs_api_full_integration.py where you could use a different JobSetting, does this seem to work for your use case?

shivatharun commented 6 months ago

Hi @tanmay-db - May I know how tasks can run parallel within job, without any dependency, is there any limitation number ?

created_job = w.jobs.create(name=f'sdk-{time.time_ns()}',
                            tasks=[ task1,task2,task3,........]))