Open MrOrz opened 4 years ago
If archiving requires headless browswer, we can implement the archiving function in url-resolver instead. https://help.archive.org/hc/en-us/articles/360001513491-Save-Pages-in-the-Wayback-Machine
Seems that just sending HEAD can work https://indieweb.org/Internet_Archive https://gist.github.com/atomotic/721aefe8c72ac095cb6e
https://archive.readme.io/docs/creating-a-snapshot A wrapper for snapshotting
but it's actually using GET request to /save
: https://github.com/ArchiveLabs/pragma.archivelab.org/blob/master/pragma/api/pragmas.py#L53
Also, here is a tool that can send to multiple archivers: https://github.com/oduwsdl/archivenow
There is a server mode available, thus it seems that we can directly dockerize the server so that rumors-api can invoke it whenever it got a url to archive.
Another promising archiver is https://github.com/ArchiveBox/ArchiveBox It will:
We can also consider not directly plugging these tools into APIs. We can instead do batch archive using Cofacts API instead.
When user submits an article and reply, we can assume that the containing URLs can can be published to Wayback machine.
We should send these docs to Internet Archive so that in the future anyone wants its backup, they can have a trustful thirdparty's archive page to go to.
Send an archive: http://web.archive.org/save/${URL} Get snapshot API: https://archive.org/help/wayback_api.php