Open Gollum999 opened 1 year ago
Yes. I think this one of the problems that could be addressed by #30669 if we decide to add a separate DB entity to keep try_nums - @uranusjr @hussein-awala -> if we move hostname to the new entity, this exact problem would be fixed
This error still happen at 2.7.3 version
Of course. It's not fixed. But if you would like to help fixing it, it's up for grabs and working on it @jose-l-goncalves-alb . Would you like to take astab on it. The linked PR shows the direction so the best way to help to solve it is to fix it. Also maybe you or your company would like to dedicate an engineer to work o it ?
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
If a task gets scheduled on a different worker host from a previous run, logs from that previous run will be unavailable from the webserver.
From my observation, it's pretty clear that this is happening because Airflow is looking for logs at the most recent hostname, not at the hostname that hosted the historical run. In the example above, attempt 2 actually ran on
nkcairflow06
, but the webserver is trying to fetch logs fromnkcairflow02
because that is where attempt 3 ran. You can watch this happen "live" with the example DAG below - each time the task retries on a different worker, the URL that it attempts to use for historical logs changes to use the most recent hostname.What you think should happen instead
All previous logs should be available from the webserver (as long as the log file still exists on disk).
How to reproduce
Operating System
CentOS Stream 8
Versions of Apache Airflow Providers
N/A
Deployment
Other
Deployment details
Airflow 2.5.1 Self-hosted Postgres DB backend CeleryExecutor
Anything else
Related issues:
15767
16472
26069
26255
Looks like #23178 attempted to fix the issue, but it never got merged.
Are you willing to submit PR?
Code of Conduct