bioio-devs / bioio

Image reading, metadata management, and image writing for Microscopy images in Python
https://bioio-devs.github.io/bioio/OVERVIEW.html
BSD 3-Clause "New" or "Revised" License
55 stars 4 forks source link

Read from public `s3://` paths without authentication, without requiring any code change from users #61

Closed pgarrison closed 3 months ago

pgarrison commented 4 months ago

Feature Description

If a file is hosted publicly on S3 and a user without AWS credentials set up must use fs_kwargs: BioImage("s3://bucketname/path/to/file", fs_kwargs=dict(anon=True)). (I'm thinking specifically about OME ZARRs, but this is likely relevant to all readers.)

Instead, bioio should be able to handle this internally and let the user write BioImage("s3://bucketname/path/to/file").

Solution

As far as I can tell, the proper way to check if a user is authenticated to read a file is to attempt to read it and see if there's an error, so the solution I think is to try to read files twice with logic similar to the following.

try:
   # __init__ with user's fs_kwargs
except SomethingSpecific as e:
   if protocol == "s3://":
       # __init__ with user's fs_kwargs plus {anon: True}
   else:
       raise e

Alternatives

Looks like they tried it in s3fs but was reverted “unfortunately, it led to far more problems than it solved. I’d be happy to see a more solid implementation, if some wants to try.”

toloudis commented 3 months ago

This came up again with our internal scientists. We need a good fix for this so our users can not waste time figuring out why their s3 urls don't work.