Open mjoeydba opened 1 week ago
Hi @mjoeydba thanks for reaching out. Here is a guide on troubleshooting Access Denied errors in S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/troubleshoot-403-errors.html
That error is likely occurring due to your settings, policies, permissions, or profile configuration. But if you'd like us to investigate this further on the SDK side, please share a complete code snippet to reproduce the issue, as well as debug logs (with any sensitive info redacted) by adding boto3.set_stream_logger('')
to your script.
Hi @tim-finnigan. Thanks for the response. Please find attached logs and program. The issue is non deterministic. The code is running against the same bucket and EC2 instance profile.
The only difference I see is the bucket name is missing in the S3 URL when there is a failure. AWS support also confirmed that when the error occurs the bucket name being sent is the first level folder under the bucket.
Describe the bug
S3 access failing for the same bucket and code that was previously successful. Debug trace shows that the URL used during failure does not include the bucket name either as host or in the path.
Success
2024-06-27 21:21:12,116 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Bucket': 'xxxx', 'Region': 'us-east-1', 'UseFIPS': False, 'UseDualStack': False, 'ForcePathStyle': False, 'Accelerate': False, 'UseGlobalEndpoint': True, 'Key': 'xxxx/xxxx.xlsx', 'DisableMultiRegionAccessPoints': False, 'UseArnRegion': True} 2024-06-27 21:21:12,116 botocore.regions [DEBUG] Endpoint provider result: https://xxxx.s3.amazonaws.com
Failure
2024-07-02 18:22:11,094 botocore.regions [DEBUG] Calling endpoint provider with parameters: {'Region': 'us-east-1', 'UseFIPS': False, 'UseDualStack': False, 'ForcePathStyle': False, 'Accelerate': False, 'UseGlobalEndpoint': True, 'DisableMultiRegionAccessPoints': False, 'UseArnRegion': True} 2024-07-02 18:22:11,095 botocore.regions [DEBUG] Endpoint provider result: https://s3.amazonaws.com
The URL in the getobject call is also showing same behavior which seems to cause the access denied error.
Expected Behavior
Successfully download object.
Current Behavior
Failure with Access Denied after it worked successfully for the same code.
Reproduction Steps
Note : The issue occurrence is unpredictable.
import pandas as pd import boto3 from io import BytesIO from pyspark.sql.functions import upper import logging from botocore.config import Config boto3.set_stream_logger('', logging.DEBUG)
boto3.set_stream_logger('')
Initialize S3 client
s3 = boto3.client('s3') INBOUND_S3_BUCKET = "xxxx" INBOUND_FILE_PATH = 'xxx/xxxx.xlsx' obj = s3.get_object(Bucket = INBOUND_S3_BUCKET, Key = INBOUND_FILE_PATH)
Possible Solution
Unknown
Additional Information/Context
No response
SDK version used
1.34.137
Environment details (OS name and version, etc.)
Linux, databricks