akamhy / waybackpy

Wayback Machine API interface & a command-line tool
https://pypi.org/project/waybackpy/
MIT License
484 stars 33 forks source link

Local store of archived urls with check for content change locally to re-archive or return archived link #188

Open ghost opened 1 year ago

ghost commented 1 year ago

Waybackpy has cdx which can check for archived content but there's the prospect of local checks.

The demo at asci cinema archives everytime you run it - that's unecessary.

During dev stage I ran into the throttling after archiving ~13 links every test. I considered a store in waybackpy with checks like hash or lash modified, I settled for last modified time. It's saved me the throttling notice and would conserve archive.org resources.

I'd suggest you add this as an optional feature, the size of the store can be configurable. Compression and other intelligent design decisions can assure users would not mind the local store.

Considerations of hash or last modified can be done when archiving local content which some use cases may be. Other methodologies of checking change in content can be thought or extended.

Consider?.