Improvement suggestions for WaybackArchiver

Hi! I'm a member of Team Wayback at the Internet Archive. I have some improvement suggestions for https://github.com/bellingcat/auto-archiver/blob/0bdd06f6415e3ed4ec0582c991352b29d38cb891/archivers/wayback_archiver.py#L11

You could use the Wayback Machine Availability API to easily get capture info about a captured URL https://archive.org/help/wayback_api.php. https://web.archive.org/web/<URL> is not recommended because its purpose is to playback the latest capture. You don't need to load the whole data of the latest capture of a URL, you just need to know if its available or not.
Save Page Now API has a lot of useful options https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit

if_not_archived_within=<timedelta> should be useful in your case.

Capture web page only if the latest existing capture at the Archive is older than the limit. Its format could be any datetime expression like “3d 5h 20m” or just a number of seconds, e.g. “120”. If there is a capture within the defined timedelta, SPN2 returns that as a recent capture. The default system is 30 min.

Cheers!

bellingcat / auto-archiver

Improvement suggestions for WaybackArchiver #59