DocNow / diffengine

track changes to the news, where news is anything with an RSS feed
MIT License
177 stars 30 forks source link

use web.archive.org directly #1

Closed edsu closed 7 years ago

edsu commented 7 years ago

It's probably a good idea to use web.archive.org directly instead of pragma as a middle man for adding a URL to Internet Archive? The relevant code can be found here.

ruebot commented 7 years ago

thinking of just getting the value from Content-Location: from the headers with requests when hitting http://web.archive.org/save/http://foo?

edsu commented 7 years ago

Yes, exactly! I know there is some logic that will return you a recent archive copy, rather than a brand new one, that impacts what is returned. If you happen to know what that window is, or know someone else who might that would be useful information. If diffengine is going to grow I suppose it could archive a copy locally at some point.

edsu commented 7 years ago

Just released this fix to PyPI as v0.0.21