Closed observingClouds closed 1 week ago
This is related to https://github.com/ipfs/go-ipfs/issues/8528: we need a way of telling if a CID or IPFS-path resolves to a directory or to a file (that's needed for fsspec
's info()
-method as well as isdir()
, isfile()
etc...
According to the issue mentioned above, cheking the ETag
is an awkward but recommended way of doing this. Apparently it does not work in all cases. Probably we'll have to exclude some gateways from out default list, if they dropped support for this or otherwise have to find ways of telling files and directories apart from what we get.
So apparently https://gateway.pinata.cloud
doesn't return etag
s, but is able to deliver the dataset. That's unfortunate, but I don't see a good way of getting what we need for info()
from their response. Thus we might have to drop that gateway from the default list...
Thanks for looking into this! This is a pity, maybe we should approach them and inform them about this issue with their service.
So, a quick solution would be to define the environment variable IPFSSPEC_GATEWAYS
and just exclude the piñata gateway or any other gateway that does not provide etag
s. I can work with that for now, but I agree that the gateway should be dropped from the default list so the UX is better.
I'm currently not able to retrieve the referenced dataset anymore. However, since version 0.5.0, ipfsspec shouldn't depend on ETags anymore, thus I'd assume that this error doesn't exist anymore and I'll close the issue.
What happened While trying to open the dataset zarr dataset
bafybeidqwf7lcs4mo343ntgxiid7n6psvryicuqkppm3wmzad2wdamnpsu
witha KeyError is sometimes raised:
Expected behaviour The dataset is returned without any error.
Potential causes Debugging the above call
by inserting a few print statements into async_ipfs.py
reveals that the "ETag" is not always returned by the server. While the header looks like
for a successful request, it misses the "ETag" when failing:
Without the "ETag" the "type"-Key is not set. https://github.com/fsspec/ipfsspec/blob/8eb96dfb0ffdebb099a47a77d2b4653988b4a0b8/ipfsspec/async_ipfs.py#L45-L53
Does this mean that the success of the function call seems to depend on which IPFS peer is responding quickest?