You could use the Wayback Machine Availability API to easily get capture info about a captured URL https://archive.org/help/wayback_api.php. https://web.archive.org/web/<URL> is not recommended because its purpose is to playback the latest capture. You don't need to load the whole data of the latest capture of a URL, you just need to know if its available or not.
if_not_archived_within=<timedelta> should be useful in your case.
Capture web page only if the latest existing capture at the Archive is older than the limit. Its format could be any datetime expression like “3d 5h 20m” or just a number of seconds, e.g. “120”. If there is a capture within the defined timedelta, SPN2 returns that as a recent capture. The default system is 30 min.
Hi! I'm a member of Team Wayback at the Internet Archive. I have some improvement suggestions for https://github.com/bellingcat/auto-archiver/blob/0bdd06f6415e3ed4ec0582c991352b29d38cb891/archivers/wayback_archiver.py#L11
https://web.archive.org/web/<URL>
is not recommended because its purpose is to playback the latest capture. You don't need to load the whole data of the latest capture of a URL, you just need to know if its available or not.if_not_archived_within=<timedelta>
should be useful in your case.Capture web page only if the latest existing capture at the Archive is older than the limit. Its format could be any datetime expression like “3d 5h 20m” or just a number of seconds, e.g. “120”. If there is a capture within the defined timedelta, SPN2 returns that as a recent capture. The default system is 30 min.
Cheers!