databrickslabs / dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
https://dbx.readthedocs.io
Other
437 stars 119 forks source link

Monitor Databricks Job in Azure CI/CD #852

Closed sasi143 closed 9 months ago

sasi143 commented 9 months ago

My Databricks Job (using DBX) is taking more than 48 hours (Data > 1TB) and I am triggering this job through my Azure CI/CD pipeline (multi-stage; Dev/qa/prod). How to monitor Databricks job status (Running/Completed) without consuming DevOps agent Resources? because the next stage should trigger if the training stage is completed. Want to know how people are using it on large-scale pipelines?

Target: Reduce usage time of DevOps agent machine because other new pipelines are in queue

renardeinside commented 9 months ago

hi @sasi143 ,

because the next stage should trigger if the training stage is completed. Want to know how people are using it on large-scale pipelines?

My assumption would be that you're using the CI/CD pipelines where you probably should consider using workflows. For instance - if you have a set of tasks, you can chain them into a Databricks Workflow, and then trigger it from the CI.

If there is a dependent component (e.g. CI pipeline) outside of Databricks, you can trigger this CI/CD pipeline via webhooks. This allows you to eliminate the need of a constrantly running Azure CI/CD pipeline and saves DevOps agent resources.