buda-base / ao-workflows

Use DAG platform to define and orchestrate workflows
0 stars 0 forks source link

Download runs out of space #16

Closed jimk-bdrc closed 3 months ago

jimk-bdrc commented 3 months ago

Processing this message:

[
  {
    "eventVersion": "2.1",
    "eventSource": "aws:s3",
    "awsRegion": "ap-northeast-2",
    "eventTime": "2024-04-06T00:11:23.730Z",
    "eventName": "ObjectRestore:Completed",
    "userIdentity": {
      "principalId": "AmazonCustomer:A1JPP2WW1ZYN4F"
    },
    "requestParameters": {
      "sourceIPAddress": "s3.amazonaws.com"
    },
    "responseElements": {
      "x-amz-request-id": "439897F6741FD9BA",
      "x-amz-id-2": "MF0oW9le+g8K5/R/uUks1QuFbZxNuSmZDWQ5utu8ZTcHEKSGFHzdFBEtebICzrPtG3YL1YVmffxhRw4nDPTZ1w=="
    },
    "s3": {
      "s3SchemaVersion": "1.0",
      "configurationId": "BagCreatedNotification",
      "bucket": {
        "name": "glacier.staging.nlm.bdrc.org",
        "ownerIdentity": {
          "principalId": "A1JPP2WW1ZYN4F"
        },
        "arn": "arn:aws:s3:::glacier.staging.nlm.bdrc.org"
      },
      "object": {
        "key": "Archive0/00/W1NLM4700/W1NLM4700.bag.zip",
        "size": 17017201852,
        "eTag": "41654cbd2a8f2d3c0abc83444fde825b-2029",
        "sequencer": "00638792A45B638391"
      }
    },
    "glacierEventData": {
      "restoreEventData": {
        "lifecycleRestorationExpiryTime": "2024-04-12T00:00:00.000Z",
        "lifecycleRestoreStorageClass": "DEEP_ARCHIVE"
      }
    }
  }
]

size is "size": 17,017,201,852" 17GB

[2024-04-05, 20:30:09 EDT] {taskinstance.py:2513} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='***' AIRFLOW_CTX_DAG_ID='sqs_scheduled_dag' AIRFLOW_CTX_TASK_ID='download_from_messages' AIRFLOW_CTX_EXECUTION_DATE='2024-04-06T00:20:00+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-04-06T00:20:00+00:00'
[2024-04-05, 20:30:09 EDT] {logging_mixin.py:188} INFO - using secrets
[2024-04-05, 20:30:09 EDT] {logging_mixin.py:188} INFO - section='ap_northeast'   ['default', 'ap_northeast']
[2024-04-05, 20:34:22 EDT] {taskinstance.py:2731} ERROR - Task failed with exception
Traceback (most recent call last):
...
                   ^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/s3transfer/download.py", line 643, in _main
    fileobj.write(data)
  File "/home/airflow/.local/lib/python3.11/site-packages/s3transfer/utils.py", line 379, in write
    self._fileobj.write(data)
OSError: [Errno 28] No space left on device
[2024-04-05, 20:34:22 EDT] {taskinstance.py:1149} INFO - Marking task as FAILED. dag_id=sqs_scheduled_dag, task_id=download_from_messages, execution_date=20240406T002000, start_date=20240406T003009, end_date=20240406T003422
[2024-04-05, 20:34:22 EDT] {standard_task_runner.py:107} ERROR - Failed to execute job 259 for task download_from_messages ([Errno 28] No space left on device; 13305)
[2024-04-05, 20:34:22 EDT] {local_task_job_runner.py:234} INFO - Task exited with return code 1

Two possible approaches:

  1. bind mount the output. This exposes the writing area to host systems. If we put this area on /mnt/AO-staging-Incoming we have an internal log of downloaded bag.zips that we can delete from outside the container.
  2. Use a shared volume, and have the docker procedure erase the bag.zip when it is complete.