Send new article & reply URLs to Wayback machine

cofacts / rumors-api

GraphQL API server for clients like rumors-site and rumors-line-bot

https://api.cofacts.tw

MIT License

109 stars 26 forks source link

Send new article & reply URLs to Wayback machine #136

Open MrOrz opened 4 years ago

MrOrz commented 4 years ago

When user submits an article and reply, we can assume that the containing URLs can can be published to Wayback machine.

We should send these docs to Internet Archive so that in the future anyone wants its backup, they can have a trustful thirdparty's archive page to go to.

Send an archive: http://web.archive.org/save/${URL} Get snapshot API: https://archive.org/help/wayback_api.php

MrOrz commented 4 years ago

If archiving requires headless browswer, we can implement the archiving function in url-resolver instead. https://help.archive.org/hc/en-us/articles/360001513491-Save-Pages-in-the-Wayback-Machine

MrOrz commented 4 years ago

Seems that just sending HEAD can work https://indieweb.org/Internet_Archive https://gist.github.com/atomotic/721aefe8c72ac095cb6e

MrOrz commented 4 years ago

https://archive.readme.io/docs/creating-a-snapshot A wrapper for snapshotting but it's actually using GET request to /save : https://github.com/ArchiveLabs/pragma.archivelab.org/blob/master/pragma/api/pragmas.py#L53

MrOrz commented 4 years ago

Also, here is a tool that can send to multiple archivers: https://github.com/oduwsdl/archivenow

There is a server mode available, thus it seems that we can directly dockerize the server so that rumors-api can invoke it whenever it got a url to archive.

MrOrz commented 7 months ago

Another promising archiver is https://github.com/ArchiveBox/ArchiveBox It will:

produce single file html
generate screenshot
extract text using readability and mercury
push to Internet Archive

We can also consider not directly plugging these tools into APIs. We can instead do batch archive using Cofacts API instead.