boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
8.84k stars 1.84k forks source link

Bucket name sometimes missing in the S3 URL - causing failures for the same bucket and code that was previously successful for operations like get object and put object. #4187

Open mjoeydba opened 1 week ago

mjoeydba commented 1 week ago

Describe the bug

S3 access failing for the same bucket and code that was previously successful. Debug trace shows that the URL used during failure does not include the bucket name either as host or in the path.

Success

2024-06-27 21:21:12,116 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Bucket': 'xxxx', 'Region': 'us-east-1', 'UseFIPS': False, 'UseDualStack': False, 'ForcePathStyle': False, 'Accelerate': False, 'UseGlobalEndpoint': True, 'Key': 'xxxx/xxxx.xlsx', 'DisableMultiRegionAccessPoints': False, 'UseArnRegion': True} 2024-06-27 21:21:12,116 botocore.regions [DEBUG] Endpoint provider result: https://xxxx.s3.amazonaws.com

Failure

2024-07-02 18:22:11,094 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Region': 'us-east-1', 'UseFIPS': False, 'UseDualStack': False, 'ForcePathStyle': False, 'Accelerate': False, 'UseGlobalEndpoint': True, 'DisableMultiRegionAccessPoints': False, 'UseArnRegion': True} 2024-07-02 18:22:11,095 botocore.regions [DEBUG] Endpoint provider result: https://s3.amazonaws.com

The URL in the getobject call is also showing same behavior which seems to cause the access denied error.

Expected Behavior

Successfully download object.

Current Behavior

Failure with Access Denied after it worked successfully for the same code.

Reproduction Steps

Note : The issue occurrence is unpredictable.

import pandas as pd import boto3 from io import BytesIO from pyspark.sql.functions import upper import logging from botocore.config import Config boto3.set_stream_logger('', logging.DEBUG)

boto3.set_stream_logger('')

Initialize S3 client

s3 = boto3.client('s3') INBOUND_S3_BUCKET = "xxxx" INBOUND_FILE_PATH = 'xxx/xxxx.xlsx' obj = s3.get_object(Bucket = INBOUND_S3_BUCKET, Key = INBOUND_FILE_PATH)

Possible Solution

Unknown

Additional Information/Context

No response

SDK version used

1.34.137

Environment details (OS name and version, etc.)

Linux, databricks

tim-finnigan commented 1 week ago

Hi @mjoeydba thanks for reaching out. Here is a guide on troubleshooting Access Denied errors in S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/troubleshoot-403-errors.html

That error is likely occurring due to your settings, policies, permissions, or profile configuration. But if you'd like us to investigate this further on the SDK side, please share a complete code snippet to reproduce the issue, as well as debug logs (with any sensitive info redacted) by adding boto3.set_stream_logger('') to your script.

mjoeydba commented 2 days ago

Hi @tim-finnigan. Thanks for the response. Please find attached logs and program. The issue is non deterministic. The code is running against the same bucket and EC2 instance profile.

The only difference I see is the bucket name is missing in the S3 URL when there is a failure. AWS support also confirmed that when the error occurs the bucket name being sent is the first level folder under the bucket.

program.txt failure_log.txt success_log.txt