fsspec / gcsfs

Pythonic file-system interface for Google Cloud Storage
http://gcsfs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
346 stars 146 forks source link

Fix md5 hash logic. Make it order agnostic. #640

Closed damjad closed 2 months ago

damjad commented 2 months ago

The order of headers is not consistent in API calls. Sometimes, md5 comes first, and sometimes crc32. The current code works fine when md5 comes first and fails when crc comes first.

Failure example:

curl -I https://storage.googleapis.com/gcp-public-data-arco-era5/ar/1959-2022-1h-240x121_equiangular_with_poles_conservative.zarr/.zattrs
HTTP/2 200
<redacted>
x-goog-hash: crc32c=KXvQqg==
x-goog-hash: md5=mZFLkyvTelC5g8XnyQrpOw==
<redacted>

In the above example, crc comes first and md5 comes later. The header x-goog-hash has the value crc32c=KXvQqg==, md5=mZFLkyvTelC5g8XnyQrpOw==.

Please take a look at the space after the comma. This extra space was failing the code for md5. This extra space is there because of how requests lib handles duplicate headers.

martindurant commented 2 months ago

Just a space :)