edgi-govdata-archiving / wayback

A Python API to the Internet Archive Wayback Machine
https://wayback.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
61 stars 12 forks source link

Handle inconsistent header case for Mementos #101

Closed Mr0grog closed 1 year ago

Mr0grog commented 1 year ago

This fixes #98, which was caused by two changes:

  1. The Internet Archive now returns most archived header names (i.e. those prefixed with 'x-archive-org-') in lower-case.
  2. In HTTP/2 (now possible since we are using HTTPS as of #97), all headers are lower-case/case-insensitive.

Since that means it’s possible to wind up with different header name casing when running the same code on different systems, I had to go ahead and make the Memento.headers attribute case-insensitive (we probably should have done this long ago for user friendliness anyway). I've implemented that using code largely taken from Requests, since their implementation is not public so we can't just use it directly (plus we plan to switch of Requests at some point anyway). If we wind up switching to HTTPX, they do have a public implementation we can just use instead of this.