When Nucleus tried to process a large amount of data (a PDS data directory of Terra Bytes of size), the Airflow ECS Operator of Nucleus failed with the following error.
airflow.providers.amazon.aws.exceptions.EcsOperatorError: {'tasks': [], 'failures': [{'reason': 'Capacity is unavailable at this time. Please try again later or in a different availability zone'}], 'ResponseMetadata': {'RequestId': 'bac90ba4-ca6d-4305-a04e-a08d95750de8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bac90ba4-ca6d-4305-a04e-a08d95750de8', 'content-type': 'application/x-amz-json-1.1', 'content-length': '135', 'date': 'Thu, 22 Feb 2024 08:03:23 GMT'}, 'RetryAttempts': 0}}
[2024-02-22, 08:03:24 UTC] {{taskinstance.py:1345}} INFO - Marking task as FAILED. dag_id=PDS_Basic_Registry_Use_Case_Messenger-jplaws, task_id=Validate_Products, execution_date=20240222T080314, start_date=20240222T080323, end_date=20240222T080324
[2024-02-22, 08:03:24 UTC] {{standard_task_runner.py:104}} ERROR - Failed to execute job 76966 for task Validate_Products ({'tasks': [], 'failures': [{'reason': 'Capacity is unavailable at this time. Please try again later or in a different availability zone'}], 'ResponseMetadata': {'RequestId': 'bac90ba4-ca6d-4305-a04e-a08d95750de8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bac90ba4-ca6d-4305-a04e-a08d95750de8', 'content-type': 'application/x-amz-json-1.1', 'content-length': '135', 'date': 'Thu, 22 Feb 2024 08:03:23 GMT'}, 'RetryAttempts': 0}}; 15455)
[2024-02-22, 08:03:24 UTC] {{local_task_job_runner.py:225}} INFO - Task exited with return code 1
[2024-02-22, 08:03:24 UTC] {{taskinstance.py:2653}} INFO - 0 downstream tasks scheduled from follow-on schedule check
🕵️ Expected behavior
I expected the Nucleus to process a large amount of data (a PDS data directory of Terra Bytes of size)without failing.
📜 To Reproduce
Tigger Nucleus workflow PDS_Basic_Registry_Use_Case_Messenger-jplaws with over 1000 times in a short duration.
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
When Nucleus tried to process a large amount of data (a PDS data directory of Terra Bytes of size), the Airflow ECS Operator of Nucleus failed with the following error.
airflow.providers.amazon.aws.exceptions.EcsOperatorError: {'tasks': [], 'failures': [{'reason': 'Capacity is unavailable at this time. Please try again later or in a different availability zone'}], 'ResponseMetadata': {'RequestId': 'bac90ba4-ca6d-4305-a04e-a08d95750de8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bac90ba4-ca6d-4305-a04e-a08d95750de8', 'content-type': 'application/x-amz-json-1.1', 'content-length': '135', 'date': 'Thu, 22 Feb 2024 08:03:23 GMT'}, 'RetryAttempts': 0}} [2024-02-22, 08:03:24 UTC] {{taskinstance.py:1345}} INFO - Marking task as FAILED. dag_id=PDS_Basic_Registry_Use_Case_Messenger-jplaws, task_id=Validate_Products, execution_date=20240222T080314, start_date=20240222T080323, end_date=20240222T080324 [2024-02-22, 08:03:24 UTC] {{standard_task_runner.py:104}} ERROR - Failed to execute job 76966 for task Validate_Products ({'tasks': [], 'failures': [{'reason': 'Capacity is unavailable at this time. Please try again later or in a different availability zone'}], 'ResponseMetadata': {'RequestId': 'bac90ba4-ca6d-4305-a04e-a08d95750de8', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'bac90ba4-ca6d-4305-a04e-a08d95750de8', 'content-type': 'application/x-amz-json-1.1', 'content-length': '135', 'date': 'Thu, 22 Feb 2024 08:03:23 GMT'}, 'RetryAttempts': 0}}; 15455) [2024-02-22, 08:03:24 UTC] {{local_task_job_runner.py:225}} INFO - Task exited with return code 1 [2024-02-22, 08:03:24 UTC] {{taskinstance.py:2653}} INFO - 0 downstream tasks scheduled from follow-on schedule check
🕵️ Expected behavior
I expected the Nucleus to process a large amount of data (a PDS data directory of Terra Bytes of size)without failing.
📜 To Reproduce
Tigger Nucleus workflow
PDS_Basic_Registry_Use_Case_Messenger-jplaws
with over 1000 times in a short duration.🖥 Environment Info
📚 Version of Software Used
No response
🩺 Test Data / Additional context
No response
🦄 Related requirements
🦄 #xyz
⚙️ Engineering Details
No response