aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.51k stars 4.12k forks source link

'aws s3 sync' is broken in version 2.14 and above on P5.48xlarge instances AWS_ERROR_S3_INVALID_RESPONSE_STATUS #8435

Open kamal-rahimi opened 10 months ago

kamal-rahimi commented 10 months ago

Describe the bug

When installing the AWS CLI version 2.14 and above the aws s3 sync s3:/buket_name local_path command fails in Linux on P5.48xlarge instances with this error

Download failed ... AWS_ERROR_S3_INVALID_RESPONSE_STATUS

Switching to version 2.13.39 or lower resolved the issue

Expected Behavior

No failure in s3 sync

Current Behavior

the aws s3 sync s3:/buket_name local_path command fails in Linux on P5.48xlarge instances with this error

Download failed ...

Reproduction Steps

  1. Create a p5.48xlarge instance
  2. run a linux docker: docker run -it rayproject/ray:2.9.0-py310-cu118 /bin/bash
  3. install aws_cli latest version as decribed here: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
  4. aws s3 sync s3://buket /tmp/data

Possible Solution

No response

Additional Information/Context

No response

CLI version used

2.14 and above

Environment details (OS name and version, etc.)

Linux

RyanFitzSimmonsAK commented 9 months ago

Hi @kamal-rahimi, thanks for reaching out. Could you provide debug logs of this behavior? You can get debug logs by adding --debug to your command, and redacting any sensitive information. Thanks!

kamal-rahimi commented 9 months ago

Hi @RyanFitzSimmonsAK , here is part of output when using debug:

[DEBUG] [2023-12-26T22:11:56Z] [00007f96067fc700] [task-scheduler] - id=0x7f96ac000d80: Running epoll_event_loop_unsubscribe_cleanup task with <Canceled> status
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0d230: Destroying event_loop
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0d230: Stopping event-loop thread.
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c2ffd700] [task-scheduler] - id=0x2b0e2b8: Scheduling epoll_event_loop_stop task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c2ffd700] [task-scheduler] - id=0x2b0e2b8: Running epoll_event_loop_stop task with <Running> status
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c2ffd700] [event-loop] - id=0x2b0d230: exiting main loop
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c2ffd700] [task-scheduler] - id=0x7f96b8000d80: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96067fc700] [task-scheduler] - id=0x7f96b8000d80: Running epoll_event_loop_unsubscribe_cleanup task with <Canceled> status
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0ca80: Destroying event_loop
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0ca80: Stopping event-loop thread.
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c37fe700] [task-scheduler] - id=0x2b0dda8: Scheduling epoll_event_loop_stop task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c37fe700] [task-scheduler] - id=0x2b0dda8: Running epoll_event_loop_stop task with <Running> status
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c37fe700] [event-loop] - id=0x2b0ca80: exiting main loop
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c37fe700] [task-scheduler] - id=0x7f96b4000f50: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96067fc700] [task-scheduler] - id=0x7f96b4000f50: Running epoll_event_loop_unsubscribe_cleanup task with <Canceled> status
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0c8f0: Destroying event_loop
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0c8f0: Stopping event-loop thread.
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c3fff700] [task-scheduler] - id=0x2b0d898: Scheduling epoll_event_loop_stop task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c3fff700] [task-scheduler] - id=0x2b0d898: Running epoll_event_loop_stop task with <Running> status
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c3fff700] [event-loop] - id=0x2b0c8f0: exiting main loop
[DEBUG] [2023-12-26T22:11:56Z] [00007f96c3fff700] [task-scheduler] - id=0x7f96bc000f50: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96067fc700] [task-scheduler] - id=0x7f96bc000f50: Running epoll_event_loop_unsubscribe_cleanup task with <Canceled> status
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0c1c0: Destroying event_loop
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0c1c0: Stopping event-loop thread.
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e0ff9700] [task-scheduler] - id=0x2b0cff8: Scheduling epoll_event_loop_stop task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e0ff9700] [task-scheduler] - id=0x2b0cff8: Running epoll_event_loop_stop task with <Running> status
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e0ff9700] [event-loop] - id=0x2b0c1c0: exiting main loop
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e0ff9700] [task-scheduler] - id=0x7f96c8000e60: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96067fc700] [task-scheduler] - id=0x7f96c8000e60: Running epoll_event_loop_unsubscribe_cleanup task with <Canceled> status
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0c030: Destroying event_loop
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0c030: Stopping event-loop thread.
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e17fa700] [task-scheduler] - id=0x2b0c758: Scheduling epoll_event_loop_stop task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e17fa700] [task-scheduler] - id=0x2b0c758: Running epoll_event_loop_stop task with <Running> status
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e17fa700] [event-loop] - id=0x2b0c030: exiting main loop
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e17fa700] [task-scheduler] - id=0x7f96c4000f50: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96067fc700] [task-scheduler] - id=0x7f96c4000f50: Running epoll_event_loop_unsubscribe_cleanup task with <Canceled> status
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0b8a0: Destroying event_loop
[INFO] [2023-12-26T22:11:56Z] [00007f96067fc700] [event-loop] - id=0x2b0b8a0: Stopping event-loop thread.
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e1ffb700] [task-scheduler] - id=0x2b0be98: Scheduling epoll_event_loop_stop task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e1ffb700] [task-scheduler] - id=0x2b0be98: Running epoll_event_loop_stop task with <Running> status
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e1ffb700] [event-loop] - id=0x2b0b8a0: exiting main loop
[DEBUG] [2023-12-26T22:11:56Z] [00007f96e1ffb700] [task-scheduler] - id=0x7f96d0000f50: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f96067fc700] [task-scheduler] - id=0x7f96d0000f50: Running epoll_event_loop_unsubscribe_cleanup task with <Canceled> status
[DEBUG] [2023-12-26T22:11:56Z] [00007f96067fc700] [S3Client] - id=0x2b08f80 Client body streaming ELG shutdown.
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [task-scheduler] - id=0x2b09160: Scheduling s3_client_process_work_task task for immediate execution
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [task-scheduler] - id=0x2b09160: Running s3_client_process_work_task task with <Running> status
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [S3Client] - id=0x2b08f80 s_s3_client_process_work_default - Moving relevant synced_data into threaded_data.
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [S3Client] - id=0x2b08f80 s_s3_client_process_work_default - Processing any new meta requests.
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [S3Client] - id=0x2b08f80 Updating meta requests.
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [S3Client] - id=0x2b08f80 Updating connections, assigning requests where possible.
[INFO] [2023-12-26T22:11:56Z] [00007f9781ffb700] [S3ClientStats] - id=0x2b08f80 Requests-in-flight(approx/exact):0/0  Requests-preparing:0  Requests-queued:0  Requests-network(get/put/default/total):0/0/0/0  Requests-streaming-waiting:0  Requests-streaming-response:0  Endpoints(in-table/allocated):0/0
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [S3Client] - id=0x2b08f80 Client shutdown progress: starting_destroy_executing=0  body_streaming_elg_allocated=0  process_work_task_scheduled=0  process_work_task_in_progress=0  num_endpoints_allocated=0 s3express_provider_active=0 finish_destroy=1
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [S3Client] - id=0x2b08f80 Client finishing destruction.
[DEBUG] [2023-12-26T22:11:56Z] [00007f9781ffb700] [channel-bootstrap] - id=0x2956810: releasing bootstrap reference

The full log includes tokens and other senstive information and I cannot share them.

RyanFitzSimmonsAK commented 9 months ago

Is the S3 bucket you're using a directory bucket?

kamal-rahimi commented 9 months ago

Yes, the S3 path is s3://bucket_name/dir_name

RyanFitzSimmonsAK commented 9 months ago

The issue is that you're using the wrong region. Due to a regression in the CRT that the team is aware of, certain instance types require that you use the correct region for the bucket you're attempting to access. If you change the region you're making the request from to match the bucket, it should work. Please let me know how that goes for you.

kamal-rahimi commented 9 months ago

@RyanFitzSimmonsAK : Yes I understand that I am downloading from a Bucket in a diffrent region, but this is quite a reasonable use case and all previous versions of the aws-cli work fine on those instances that we see issue with the latest version. Our current workaround solution is that we are using an old version of aws-cli.

rvencu commented 5 months ago

while we circumvented this by moving the S3 bucket to the correct region when using container credentials we still encountered issues where for instance ls commands work while cp commands throw errors

richard@ip-10-0-133-32:~$ aws --version
aws-cli/2.15.55 Python/3.11.8 Linux/5.15.0-1058-aws exe/x86_64.ubuntu.20
richard@ip-10-0-133-32:~$ aws s3 cp s3://s-harmonai-west/datasets/songs_raw/songs_md_2/train/songs-md-005281.tar .

30 (AWS_ERROR_PRIORITY_QUEUE_EMPTY): Attempt to pop an item from an empty queue.
richard@ip-10-0-133-32:~$ aws s3 cp s3://s-harmonai-west/datasets/songs_raw/songs_md_2/train/songs-md-005281.tar . --region us-west-2

30 (AWS_ERROR_PRIORITY_QUEUE_EMPTY): Attempt to pop an item from an empty queue.
richard@ip-10-0-133-32:~$ aws s3 ls s3://s-harmonai-west
                           PRE /
                           PRE checkpoints/
                           PRE checpoints/
                           PRE datasets/
                           PRE flavio/
                           PRE million_song_dataset/
                           PRE shawley/
                           PRE unprocessed/
                           PRE zqevans/
tsdev commented 4 months ago

I am having the same issue here on P5.48xlarge. I am able to run aws s3 ls on the bucket but get the error AWS_ERROR_S3_INVALID_RESPONSE_STATUS: Invalid response status from request when try to copy a file into the bucket. The EC2 instance is in eu-nort-1, while the bucket is in eu-central-1.

RyanFitzSimmonsAK commented 4 months ago

Unfortunately, this isn't something the CLI team is able to address. The recommended workaround at this time is to disable the use of the CRT transfer client. To do this, set the preferred_transfer_client to classic. Please let me know if this workaround does / does not work for you.