Closed amcnicho closed 2 months ago
As long as AWS credentials are configured (documented here and here), after the refactoring of #149 the following already works:
import s3fs
from xradio._utils.zarr.common import _open_dataset
s3 = s3fs.S3FileSystem(anon=False, requester_pays=False)
for processing_set in s3.listdir("s3://viper-test-data/Antennae_North.cal.lsrk.split.vis.zarr/"):
for msv4 in s3.listdir(processing_set["name"]):
# note that prepending the store arg is necessary to avoid `FileNotFoundError`
# otherwise it's assumed they're stored locally on disk
test_output = _open_dataset("s3://"+msv4["name"])
This because zarr can interpret S3 bucket URLs using its FSStore wrapper around the general MutableMapping interface exposed by fsspec
, which should be accepted as a dependency.
There is a new test added for the case of S3 reads, and that's what failed in the checks on this branch. I thought I'd set up the test dataset bucket as completely public, but clearly that configuration (or the initialization of s3fs) is insufficient to yield a passing test. We'll need to consider how to securely provide AWS credentials to the service account running tests, and then generalize to a coherent policy of support for S3 bucket access.
1.) On my jwsteeb AWS account, I can see public buckets but when I try: https://us-east-1.console.aws.amazon.com/s3/buckets/viper-test-data?region=us-east-1&bucketType=general&tab=objects it says Insufficient permissions to list objects.
2.) In read_processing_set I removed the try except PermissionError and s3 = s3fs.S3FileSystem(anon=False, requester_pays=False) works. This also seems to indicate that the bucket is not public.
3.) In test_s3_read_processing_set why is "s3://viper-test-data/Antennae_North.cal.lsrk.split.ms" using .ms instead of .vis.zarr?
I have also merged the latest version of main into this branch. This has no impact on the code of this pull request.
Although the objects in the bucket containing the test dataset had been set to fully public, it was necessary to add a policy granting list access to the "Everyone" category (this can also be done via ACL if those are enabled). We've done that, so that addresses your points 1 and 2. Point 3 was a typo that snuck in from local testing, which I have corrected. At this point the new test still doesn't pass when run by the pytest service account in the test action container.
The main issue to resolve is that botocore.exceptions.NoCredentialsError: Unable to locate credentials
raises when the S3FileSystem
is created using anon=True
. I wouldn't expect boto to require credentials at all in this case, but even when I provide those (via keyword argument, environment variable, or passing in an existing session to bypass client configuration from the s3fs library) it still complains.
Adding a GetObject policy to all objects in the zarr dataset bucket was also necessary in order to avoid Permission and NoCredential Errors from s3fs/boto. After additional refactoring to force the use of signature_version=UNSIGNED
(for the anon=True
case), the new test of anonymous public reads passes CI tests, and local testing with AWS credentials available still work as expected. Unless there are other tests desired, this seems ready :shipit:
Changes to the visibility read and load functions for #140