Open paramjeet01 opened 6 months ago
Also found this github thread, shall we implement this : https://github.com/kubernetes-client/python-base/issues/190#issuecomment-805073981
@potiuk , Can you please guide us on this ? I'm not able to find a solution for this. This occurs intermittently and works on next retry. If you require further information on this , I'll be able collect the logs.
I can confirm that the issue is solved with the below code , we have added this as custom extract_xcom : This is also mentioned here : https://github.com/kubernetes-client/python-base/issues/190#issuecomment-805073981 , We didn't have this issue in v2.3.3 , so I believe this PR could have caused the error : https://github.com/apache/airflow/pull/23490/files
def extract_xcom_json(self, pod: V1Pod):
try:
self.log.info(f'Running command... cat {PodDefaults.XCOM_MOUNT_PATH}/return.json')
client = kubernetes_stream(
self._client.connect_get_namespaced_pod_exec,
pod.metadata.name,
pod.metadata.namespace,
container=PodDefaults.SIDECAR_CONTAINER_NAME,
command=[
'/bin/sh',
'-c',
f'cat {PodDefaults.XCOM_MOUNT_PATH}/return.json',
],
stderr=True,
stdin=False,
stdout=True,
tty=False,
_preload_content=False,
_request_timeout=10,
)
client.run_forever(timeout=10)
result = client.read_all()
self.log.info("Received {} ({}) ({} ... {}))".format(type(result), len(result), result[:64], result[-64:]))
# validate it's valid json
_ = json.loads(result)
# Terminate the sidecar
_ = kubernetes_stream(
self._client.connect_get_namespaced_pod_exec,
pod.metadata.name,
pod.metadata.namespace,
container=PodDefaults.SIDECAR_CONTAINER_NAME,
command=[
'/bin/sh',
'-c',
'kill -s SIGINT 1',
],
stderr=True,
stdin=False,
stdout=True,
tty=False,
_preload_content=True,
_request_timeout=10,
)
return result
except json.JSONDecodeError:
message = f'Failed to decode json document from pod: {pod.metadata.name}'
self.log.exception(message)
raise AirflowException(message)
except Exception as e:
message = f'Failed to extract xcom from pod: {pod.metadata.name}'
self.log.exception(message)
raise AirflowException(message)
@paramjeet01 do you have a proposed fix in mind? Can you open a PR?
Any news here? Is there any known workaround while a fix is in the way?
@eladkal yes , I'll create a PR with the above suggest code. I'm afraid that I can't find the root cause of the issue in current code.
@cleivson , you can customise your airflow to call the above mentioned method till we have a fix
@paramjeet01 , thanks. I'm not really sure where to put this. Based on your SO "Amazon Linux 2", am I right to assume you're using MWAA? I'm using it and I was wondering if the bug could be related to the environment
No , I use community edition and I have customized the xcom code to solve this problem
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.8.3
What happened?
We are facing intermittent json error but on next retry it works.
What you think should happen instead?
The task should not fail when xcom data.
How to reproduce
This can be reproduced by having a 20k char in json file for xcom and it'll fail intermittently while taking the data. I'll investigate on the reason for the xcom json issue.
Operating System
Amazon Linux 2
Versions of Apache Airflow Providers
pytest>=6.2.5 docker>=5.0.0 crypto>=1.4.1 cryptography>=3.4.7 pyOpenSSL>=20.0.1 ndg-httpsclient>=0.5.1 boto3>=1.34.0 sqlalchemy redis>=3.5.3 requests>=2.26.0 pysftp>=0.2.9 werkzeug>=1.0.1 apache-airflow-providers-cncf-kubernetes==8.0.0 apache-airflow-providers-amazon>=8.13.0 psycopg2>=2.8.5 grpcio>=1.37.1 grpcio-tools>=1.37.1 protobuf>=3.15.8,<=3.21 python-dateutil>=2.8.2 jira>=3.1.1 confluent_kafka>=1.7.0 pyarrow>=10.0.1,<10.1.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
Official helm chart deployment.
Anything else?
I think , we are facing similar issue : https://github.com/apache/airflow/issues/32111 And it's fixed here : https://github.com/apache/airflow/pull/32113/files , we might need to increase the retry count.
Are you willing to submit PR?
Code of Conduct