Closed deepaktripathi1997 closed 1 year ago
Thanks for opening your first issue here! Be sure to follow the issue template!
Airflow raises AirflowTaskTimeout
exception when the task timed out, and your code can catch this exception and handle it if needed.
Since you have [2023-02-15, 20:58:20 IST] {timeout.py:68} ERROR - Process timed out, PID: 11392
in the log, the timeout exception is raised, but it seems like your code is stuck in one of the finally blocks, where python calls finally before raising the exception. I cannot check whats wrong with your operator, but I can recommend some steps to debug the problem.
First, here is a simple exemple:
import datetime
import time
from typing import Any
import pendulum
from airflow.models import BaseOperator
from airflow.models.dag import dag
from airflow.utils.context import Context
class MyOperator(BaseOperator):
def execute(self, context: Context) -> Any:
try:
print("try")
time.sleep(120)
except Exception as e:
print(e)
raise e
finally:
print("finally")
time.sleep(120)
@dag(
schedule=None,
start_date=pendulum.yesterday(),
)
def timeout_dag():
MyOperator(task_id="test_task", execution_timeout=datetime.timedelta(seconds=30))
timeout_dag()
And here is the log
[2023-02-19, 21:27:29 UTC] {logging_mixin.py:149} INFO - try
[2023-02-19, 21:27:59 UTC] {timeout.py:68} ERROR - Process timed out, PID: 98045
[2023-02-19, 21:27:59 UTC] {logging_mixin.py:149} INFO - Timeout, PID: 98045
[2023-02-19, 21:27:59 UTC] {logging_mixin.py:149} INFO - finally
[2023-02-19, 21:29:59 UTC] {taskinstance.py:1837} ERROR - Task failed with exception
Traceback (most recent call last):
File "/files/dags/dag18.py", line 19, in execute
raise e
File "/files/dags/dag18.py", line 16, in execute
time.sleep(120)
File "/opt/airflow/airflow/utils/timeout.py", line 69, in handle_timeout
raise AirflowTaskTimeout(self.error_message)
airflow.exceptions.AirflowTaskTimeout: Timeout, PID: 98045
To detect the problem, you can create a simple test to execute your operator in the IDE in debug mode, and use breakpoints at each step to follow the call stack. If you find this complicated, you can add a print
after each line and read the log from the UI to find where the task is stuck.
Also if you have some special actions to do when the task is timed out, you can add this in a new except block:
except AirflowTaskTimeout as timeout_exception:
do_something()
raise timeout_exception
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
We're experiencing occasional issues with tasks that have specified an 'execution_timeout'. Despite the process being timed out, the task remains stuck in a 'running' state for several hours.
The task looks like this:
Value for
execution_timeout=pendulum.duration(minutes=3)
Task Logs: Falling back to local log Reading local file: /home/deploy/ssot-airflow/logs/dag_id=realtime_payout_payout_Beneficiary/run_id=scheduled__2023-02-15T13:17:00+00:00/task_id=save_job_details/attempt=1.log
[2023-02-15, 20:55:21 IST] {logging_mixin.py:137} INFO - Job status of previous task -> 1 [2023-02-15, 20:55:21 IST] {logging_mixin.py:137} INFO - last run epoch received of previous task -> 1676474400 [2023-02-15, 20:55:21 IST] {logging_mixin.py:137} INFO - insert into ***
[2023-02-15, 20:55:21 IST] {base.py:73} INFO - Using connection ID 'nrt_rds' for task execution. [2023-02-15, 20:55:21 IST] {base.py:73} INFO - Using connection ID 'aws_default' for task execution. [2023-02-15, 20:55:21 IST] {credentials.py:1049} INFO - Found credentials from IAM Role: ssot-prod-role [2023-02-15, 20:58:20 IST] {timeout.py:68} ERROR - Process timed out, PID: 11392
The task is running for past 1.5 hours and not failing.
Operating System Virtualization: kvm Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 6.0.10-1.el7.elrepo.x86_64 Architecture: x86-64
This happens only when there is some timeout in this task .
What you think should happen instead
No response
How to reproduce
No response
Operating System
CentOS Linux 7 (Core)
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==6.2.0 apache-airflow-providers-celery==3.1.0 apache-airflow-providers-common-sql==1.3.1 apache-airflow-providers-ftp==3.1.0 apache-airflow-providers-http==4.0.0 apache-airflow-providers-imap==3.0.0 apache-airflow-providers-mysql==3.4.0 apache-airflow-providers-postgres==5.0.0 apache-airflow-providers-sftp==4.2.0 apache-airflow-providers-sqlite==3.2.1 apache-airflow-providers-ssh==3.3.0
Deployment
Virtualenv installation
Deployment details
_CPU: 64 core Mem: 256 GB workerautoscale: 1024, 256
Anything else
No response
Are you willing to submit PR?
Code of Conduct