databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
228 stars 119 forks source link

DBT jobs keep running when the process is killed #741

Closed ajsquared closed 4 months ago

ajsquared commented 4 months ago

Describe the bug

We run DBT via Airflow. When a running Airflow task is manually marked as failed, Airflow will terminate all child processes.

However, with our DBT jobs the process is killed on the Airflow worker, but continues executing on our Databricks cluster.

This does work properly with a ctrl-c in an interactive shell.

Steps To Reproduce

Sending a SIGTERM with kill to a local DBT process appears to produce the same behavior (local process dies, but the job keeps executing on the cluster), so this can be reproduced outside of Airflow.

Expected behavior

When the DBT process is terminated, the job should stop executing on the cluster.

Screenshots and log output

This shows the termination process from Airflow:

[2024-07-23, 21:47:39 UTC] {local_task_job_runner.py:302} WARNING - State of this instance has been externally set to failed. Terminating instance.
[2024-07-23, 21:47:39 UTC] {process_utils.py:131} INFO - Sending 15 to group 306687. PIDs of all processes in the group: [306688, 306749, 306785, 306687]
[2024-07-23, 21:47:39 UTC] {process_utils.py:86} INFO - Sending the signal 15 to group 306687
[2024-07-23, 21:47:39 UTC] {taskinstance.py:2483} ERROR - Received SIGTERM. Terminating subprocesses.
[2024-07-23, 21:47:39 UTC] {subprocess.py:111} INFO - Sending SIGINT signal to process group

System information

The output of dbt --version:

Core:
  - installed: 1.8.3
  - latest:    1.8.4 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.8.3 - Update available!
  - spark:      1.8.0 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using: Linux

The output of python --version: Python 3.11.7

benc-db commented 4 months ago

I think this might be better reported at dbt-core; I believe the handling of cancel signals happens there, and then they delegate to adapters to handle the cancellation. If this works for one cancel scenario, but not another, the fixes probably need to happen in core.

ajsquared commented 4 months ago

Ah makes sense. I actually found a similar issue there (https://github.com/dbt-labs/dbt-core/issues/8356), so I'll close this one.