Open MatinF opened 4 years ago
s3fs certain is known to work for some non-aws s3 implementations. Can you also turn on s3fs logging, so we know which call this is happening for? I see that the first block is a HEAD call and the second a GET call with bytes-range. It may be a botocore error, or perhaps some setting that is needed (perhaps following a redirect?)
@MatinF , any further details here?
Hi Martin, we did not find further details/insight on this I'm afraid. We ended up with a solution that effectively removes the endpoint details when AWS is used.
Outline We're using S3FS to connect to S3 servers and e.g. download data. In some cases the S3 server would be e.g. a MinIO S3 and in other cases an AWS S3 server. To facilitate this distinction, we typically explicitly parse the endpoint URL as below:
Expected behavior When running the above, we would expect to see consistently the length of the object we're trying to open.
Actual behavior When we use the above, we do indeed get the expected result on most systems - but on some systems we experience an issue where the GET request fails. Specifically, below are partial debug outputs from a working vs. non-working system from using the above code:
Working system
Non-working system
As evident, the non-working system produces a different debug output and as part of it a
'Content-Length: '0'
, resulting in an error so that the file cannot be downloaded. Both systems are Windows 10, Python 3.7.9 and with the below pip freeze running in virtual environments.If we remove the explicit parsing of the S3 endpoint, however, it seems to work - in the sense that both systems are able to correctly download the file. However, we struggle to understand the logic of this and we're hence concerned if the error may reoccur. Any help is appreciated!
pip freeze