Closed pgarrison closed 7 months ago
So here's my take: pathlike_to_fs is being called with enforce_exists True. The filesystem is correctly deduced to be HTTPFileSystem.
However, the path to the zarr is a directory, which fails the fs.exists function call. If somehow we could append .zgroup
or .zattrs
then the existence check would pass and we'd be in good shape.
Currently this function call is not reader-specific. The readers don't get a chance to modify the URL or add extra kwargs before pathlike_to_fs is called. This seems to be a remnant of the assumption that the url is always a file and not a directory - we'd be able to globally check for the existence independent of any Reader implementation.
So, one question is: should we loop over each Reader, and let the ones that pass our test of potentially supporting the file type, each do their own existence check? (With maybe a default fallback in bioio-base.Reader?)
So here's my take: pathlike_to_fs is being called with enforce_exists True. The filesystem is correctly deduced to be HTTPFileSystem.
However, the path to the zarr is a directory, which fails the fs.exists function call. If somehow we could append
.zgroup
or.zattrs
then the existence check would pass and we'd be in good shape.
Is this because fs.exists()
only returns True if the path is a file and not a directory? (I sort of assume so because blob storage is all fake directories anyway.)
In which case completely agree that we should check for a subfile. Completely forgot that fsspec interacts this way.
So, one question is: should we loop over each Reader, and let the ones that pass our test of potentially supporting the file type, each do their own existence check? (With maybe a default fallback in bioio-base.Reader?)
I think this is good use case for having a base Reader impl and allowing for custom Reader impls that we just call to.
So here's my take: pathlike_to_fs is being called with enforce_exists True. The filesystem is correctly deduced to be HTTPFileSystem. However, the path to the zarr is a directory, which fails the fs.exists function call. If somehow we could append
.zgroup
or.zattrs
then the existence check would pass and we'd be in good shape.Is this because
fs.exists()
only returns True if the path is a file and not a directory? (I sort of assume so because blob storage is all fake directories anyway.)
Yes, at least when accessing S3 via the HTTPFileSystem. I think s3fs does not have this issue.
So, one question is: should we loop over each Reader, and let the ones that pass our test of potentially supporting the file type, each do their own existence check? (With maybe a default fallback in bioio-base.Reader?)
I notice that bioio already loops over each Reader to call is_supported_image
, so it seems redundant to have two loops, one to check if the file exists and another to check if it's supported.
So, one question is: should we loop over each Reader, and let the ones that pass our test of potentially supporting the file type, each do their own existence check? (With maybe a default fallback in bioio-base.Reader?)
I notice that bioio already loops over each Reader to call
is_supported_image
, so it seems redundant to have two loops, one to check if the file exists and another to check if it's supported.
It did occur to me to have enforce_exists false in the first check, and then in the per-reader check we can do special things like appending .zattrs or .zgroup to the path.
Description: see title
Code to reproduce
Workaround
If the OME ZARR reader is specified explicitly,
determine_plugin
is bypassed and the image can be loaded.bioio.BioImage(path, reader=bioio_ome_zarr.Reader)
.Environment
I'm testing against bioio-ome-zarr PR 17 which improves support for reading from S3.