Closed gmaze closed 1 month ago
@tcarval is there any reasons for not having the gz index files on s3 ? https://argo-gdac-sandbox.s3.eu-west-3.amazonaws.com/pub/index.html#pub/idx/
@tcarval is there any reasons for not having the gz index files on s3 ? https://argo-gdac-sandbox.s3.eu-west-3.amazonaws.com/pub/index.html#pub/idx/
I am adding the gz indexes (the synchronization gdac - aws is underway)
New IndexStore ready to work with AWS S3 core index file
from argopy import ArgoIndex
idx = ArgoIndex(host='s3://argo-gdac-sandbox/pub/idx').load()
idx.search_wmo_cyc(6903091, 1)
poke @tcarval
On github actions, when unit testing the new s3 store, we fall back on an anonymous requests with the following client:
import boto3
from botocore import UNSIGNED
from botocore.client import Config
fs = boto3.client('s3', config=Config(signature_version=UNSIGNED))
but tests fails with the error:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the SelectObjectContent operation: Access Denied
I don't get why we can't run anonymously a select_object_content
on the bucket
poke: @quai20 @tcarval
MORE:
fs = boto3.client('s3', config=Config(signature_version=UNSIGNED))
fs._request_signer._credentials is None
returns True, as it should
object_list = fs.list_objects_v2(Bucket='argo-gdac-sandbox', Prefix="pub/idx/argo_synthetic-profile_index.txt.gz")
object_list
returns:
{'ResponseMetadata': {'RequestId': 'PMVAY0JH3KRP8J3Y',
'HostId': 'xqACBxsLPkqHm1VEPccv0zsceMm7s3cn5i5mey6Wd0yIHdTED8UbGA+ZGe0pLxiPnJLWaT3goIo=',
'HTTPStatusCode': 200,
'HTTPHeaders': {'x-amz-id-2': 'xqACBxsLPkqHm1VEPccv0zsceMm7s3cn5i5mey6Wd0yIHdTED8UbGA+ZGe0pLxiPnJLWaT3goIo=',
'x-amz-request-id': 'PMVAY0JH3KRP8J3Y',
'date': 'Tue, 09 Jul 2024 20:20:54 GMT',
'x-amz-bucket-region': 'eu-west-3',
'content-type': 'application/xml',
'transfer-encoding': 'chunked',
'server': 'AmazonS3'},
'RetryAttempts': 0},
'IsTruncated': False,
'Contents': [{'Key': 'pub/idx/argo_synthetic-profile_index.txt.gz',
'LastModified': datetime.datetime(2024, 7, 9, 3, 0, 5, tzinfo=tzutc()),
'ETag': '"cc0d89c9dbda566cb9a29085b55d3a5a"',
'Size': 6232628,
'StorageClass': 'STANDARD'}],
'Name': 'argo-gdac-sandbox',
'Prefix': 'pub/idx/argo_synthetic-profile_index.txt.gz',
'MaxKeys': 1000,
'EncodingType': 'url',
'KeyCount': 1}
Because describing your problem is always already partly solving it !
The bucket can be read anonymously, but it is the SelectObjectContent
method thats requires credentials !
Permissions You must have the s3:GetObject permission for this operation. Amazon S3 Select does not support anonymous access. For more information about permissions, see Specifying Permissions in a Policy in the Amazon S3 User Guide.
Looking for a solution on the test repo here: https://github.com/gmaze/ga_aws_access
The Argo ADMT is experiencing with Amazon S3 in order to move the GDAC infrastructure into the cloud. In order to prepare argopy for this and to be able to access and test the AWS prototype server, we need to develop support for S3. This would require:
A new data fetcher shall be developed in another PR