Open itsnotapt opened 6 months ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
what is apache-airflow-providers-cncf-kubernetes version in your env?
When reading the logs in trigger_reentry
when the trigger return a "success" status, the operator read the log by calling self.write_logs
https://github.com/apache/airflow/blob/191b5c30e68566a75f67aefc860f59573b79bed6/airflow/providers/cncf/kubernetes/operators/pod.py#L745
When reading the logs, the follow
parameter is hardcoded to False.
https://github.com/apache/airflow/blob/191b5c30e68566a75f67aefc860f59573b79bed6/airflow/providers/cncf/kubernetes/operators/pod.py#L781
Maybe the bug is coming from here ?
apache-airflow-providers-cncf-kubernetes==8.0.1
Ok, so I did some investigation.
The POST_TERMINATION_TIMEOUT is set to 120 seconds, which means that the pod's logs will be available for retrieval for up to 120 seconds after the pod termination. https://github.com/apache/airflow/blob/8fc984873aab3424df0d44351da136e5c65b81e2/airflow/providers/cncf/kubernetes/operators/pod.py#L235
But your task is hanging in the trigger for more than 120 after pod termination and because of this log is not available.
To test this you can try to reduce the poll_interval
to maybe 60 seconds (since your script probably will be finshed quickly) and your logs should be available.
I'll see if it makes sense to parametize it.
Found one more small bug while debugging and created fixes https://github.com/apache/airflow/pull/38075
That appears to do the trick. Thanks for looking into this so quickly.
When reading the logs in
trigger_reentry
when the trigger return a "success" status, the operator read the log by callingself.write_logs
When reading the logs, the
follow
parameter is hardcoded to False.Maybe the bug is coming from here ?
hmm, since the follow false it can either miss some logs or can produce some duplicate logs in termination steps. I have created https://github.com/apache/airflow/pull/38081 to fix it
@pankajastro will https://github.com/apache/airflow/pull/38081 resolve this issue?
@pankajastro will #38081 resolve this issue?
No, it does not fix this issue. I'm not sure even we need to work on this. https://github.com/apache/airflow/pull/38081 Address comment https://github.com/apache/airflow/issues/38003#issuecomment-1987238944
https://github.com/apache/airflow/issues/38003#issuecomment-1991780598 suggestion worked for user https://github.com/apache/airflow/issues/38003#issuecomment-1992122986
Apache Airflow Provider(s)
cncf-kubernetes
Versions of Apache Airflow Providers
No response
Apache Airflow version
2.8.2
Operating System
apache/airflow:2.8.2-python3.10
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened
There seems to be a 50/50 chance that the correct logs will be returned by the pod.
I'm expecting the following:
Successful log:
Unsuccessful log:
What you think should happen instead
No response
How to reproduce
The example code that is being used:
Anything else
No response
Are you willing to submit PR?
Code of Conduct