ResearchObject / ro-crate-py

Python library for RO-Crate
https://pypi.org/project/rocrate/
Apache License 2.0
46 stars 23 forks source link

Better handling of HTTP header #182

Closed simleo closed 5 months ago

simleo commented 5 months ago

The issue addressed by this PR was reported by @rsirvent. For the following URL:

https://zenodo.org/records/10782431/files/lysozyme_datasets.zip

Apparently the content-type duplication is not really an error: the server sends content-type twice and urllib merges them with a comma, which is consistent with RFC 7230 (see the Requests docs for response headers). However, Requests (and the underlying Urllib3) somehow detects that this is some kind of server error and provides a content-type of application/octet-stream. With this PR, we use Requests to get the header and avoid the above problems. It is worth noting that the new behavior is consistent with curl -I.

stain commented 5 months ago

Should we report this upstream to Zenodo?

rsirvent commented 5 months ago

Done. Wrote a support request at the Zenodo website, linking to this PR.

Edit: they have answered really quick (ticket number at the bottom):

Dear Raul,

We have logged this issue internally and will investigate/fix it in due course.

Thank for you the bug report! Please let me know if there's anything else I can help you with.

Best regards, Carlin

Zenodo Support https://zenodo.org/ Support Zenodo — Could we suggest you support Zenodo’s features' development by donating to the CERN & Society Foundation? For more information see https://zenodo.org/donate or reach out directly to our Partnerships & Fundraising colleagues (donorcare@csf.cern.ch) if you have any doubts about the donation process.

Ticket#33602