boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
9.08k stars 1.87k forks source link

s3 client global endpoint_url results in PermanentRedirect when region provided #4360

Open bentohset opened 2 days ago

bentohset commented 2 days ago

Describe the bug

When using the boto3 client for s3 ListObjects with endpoint_url without region inside (eg. https://s3.amazonaws.com and not https://s3.ap-southeast-1.amazonaws.com) results in ClientError PermanentRedirect

Regression Issue

Expected Behavior

S3 objects should be listed without issue while using the current environment variables (without editing them if possible)

Current Behavior

Using the below configuration of the SDK results in a PermanentRedirect error

os.environ["AWS_ACCESS_KEY_ID"] = "XXXXXXXX"
os.environ["AWS_DEFAULT_REGION"] = "ap-southeast-1"
os.environ["AWS_ENDPOINT_URL"] = "https://s3.amazonaws.com"
os.environ["AWS_SECRET_ACCESS_KEY"] = "XXXXXXXXX"
os.environ["S3_USE_HTTPS"] = "1"
os.environ["S3_VERIFY_SSL"] = "1"
uri = "s3://bucket-name/path-to-file"

aws_access_key = os.environ["AWS_ACCESS_KEY_ID"]
aws_secret = os.environ["AWS_SECRET_ACCESS_KEY"]
region = os.environ["AWS_DEFAULT_REGION"]

s3 = boto3.resource(
    "s3",
    region_name=region,
    verify=True,
    aws_access_key_id=aws_access_key,
    aws_secret_access_key=aws_secret,
)
parsed = urlparse(uri, scheme="s3")
bucket_name = parsed.netloc
bucket_path = parsed.path.lstrip("/")

bucket = s3.Bucket(bucket_name)
for obj in bucket.objects.filter(Prefix=bucket_path):
    print(obj.key)

Error:

Traceback (most recent call last):
  File "/REDACTED/testing/s3.py", line 89, in <module>
    for obj in bucket.objects.filter(Prefix=bucket_path):
  File "/REDACTED/.pyenv/versions/3.10.15/lib/python3.10/site-packages/boto3/resources/collection.py", line 79, in __iter__
    for page in self.pages():
  File "/REDACTED/.pyenv/versions/3.10.15/lib/python3.10/site-packages/boto3/resources/collection.py", line 169, in pages
    for page in pages:
  File "/REDACTED/.pyenv/versions/3.10.15/lib/python3.10/site-packages/botocore/paginate.py", line 269, in __iter__
    response = self._make_request(current_kwargs)
  File "/REDACTED/pyenv/versions/3.10.15/lib/python3.10/site-packages/botocore/paginate.py", line 357, in _make_request
    return self._method(**current_kwargs)
  File "/REDACTED/pyenv/versions/3.10.15/lib/python3.10/site-packages/botocore/client.py", line 569, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/REDACTED/.pyenv/versions/3.10.15/lib/python3.10/site-packages/botocore/client.py", line 1023, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (PermanentRedirect) when calling the ListObjects operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endp
oint.

The issue occurs only in boto3 versions of 1.25.0 onwards. Previously we were using boto3==1.18.18 which had no issue. Region was not required to list objects in the bucket in this earlier version.

We need to use the endpoint_url parameter as we are supporting customers who are using non-AWS S3 storage. The issue is resolved when the region is inserted into the endpoint_url (eg. https://aws.ap-southeast-1.amazonaws.com) or the AWS_DEFAULT_REGION environment variable is not provided but some of our current users are using endpoint urls without the region. Would there be a way to resolve this error?

Reproduction Steps

Run the following code using latest version of boto3

os.environ["AWS_ACCESS_KEY_ID"] = "XXXXXXXX"
os.environ["AWS_DEFAULT_REGION"] = "ap-southeast-1"
os.environ["AWS_ENDPOINT_URL"] = "https://s3.amazonaws.com"
os.environ["AWS_SECRET_ACCESS_KEY"] = "XXXXXXXXX"
os.environ["S3_USE_HTTPS"] = "1"
os.environ["S3_VERIFY_SSL"] = "1"
uri = "s3://bucket-name/path-to-file"

aws_access_key = os.environ["AWS_ACCESS_KEY_ID"]
aws_secret = os.environ["AWS_SECRET_ACCESS_KEY"]
region = os.environ["AWS_DEFAULT_REGION"]

s3 = boto3.resource(
    "s3",
    region_name=region,
    verify=True,
    aws_access_key_id=aws_access_key,
    aws_secret_access_key=aws_secret,
)
parsed = urlparse(uri, scheme="s3")
bucket_name = parsed.netloc
bucket_path = parsed.path.lstrip("/")

bucket = s3.Bucket(bucket_name)
for obj in bucket.objects.filter(Prefix=bucket_path):
    print(obj.key)

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.35.23

Environment details (OS name and version, etc.)

MacOS Sonoma 14.6.1 Apple M2 Max

adev-code commented 2 days ago

Hello @bentohset, thanks for reaching out. As of the update from S3 Service (September 23, 2020) - S3 have transitioned to virtual-hosted–style URLs. This would mean that any buckets outside of US-EAST-1, you would need to add the region on the endpoint URL. https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html and https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/ . Please let me know if you have further questions. Thanks.