edgi-govdata-archiving / wayback

A Python API to the Internet Archive Wayback Machine
https://wayback.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
61 stars 12 forks source link

Remove `ref-names` field from `git_archival.txt` #162

Closed Mr0grog closed 2 months ago

Mr0grog commented 2 months ago

It turns out the way we populated the ref-names field causes its value to change depending on whether the commit a Git archive was built from was the head of a branch when it was built (which is often true when we first cut a release, but ceases to be true soon afterward. If someone downloads an archive later and tries to compare its signature with the one we released, it won’t match because this field has changed, and that’s a significant issue.

Unfortunately, there’s no good alternative that resolves this issue, so the best solution is to just remove the field. On the up-side, describe-name carries the more critical info about the the current tag or release version, so this isn’t a huge loss.

Thanks to @webknjaz for pointing this out: https://github.com/edgi-govdata-archiving/wayback/pull/144#discussion_r1651311346