crimethinc / website

Ruby on Rails app that powers crimethinc.com
https://crimethinc.com
Creative Commons Zero v1.0 Universal
102 stars 31 forks source link

Confirm that Internet Archive push is working on every update #1826

Open veganstraightedge opened 3 years ago

veganstraightedge commented 3 years ago

Does this still work?

https://github.com/crimethinc/website/issues/451

Let's find out and either close this with no additional effort needed or let's fix the thing so that archive.org always has all of the site's articles.

astronaut-wannabe commented 3 years ago

so, looking into this i think the api has just been going down. Now, rather than 500s, we are getting timeouts

I am having a hard time figuring out if this api is even supported anymore

all that said, I did find out the the Internet Archive has a way to just send an email full of links, and they will archive all of the URLs, and email you back the results: https://blog.archive.org/2019/10/23/the-wayback-machines-save-page-now-is-new-and-improved/

I wonder if it would be more stable to write an ActionMailer job to run nightly and batch process articles based on updated_at >= 1.day.ago

astronaut-wannabe commented 3 years ago

I also found this internet archive browser extension that has a "Save Page Now" feature that is working, so maybe we can extract that code into ruby?

https://github.com/internetarchive/wayback-machine-webextension/blob/2b46d356f625e28ef98b376541edbe5f7203bbb4/webextension/scripts/background.js#L59-L116

bensheldon commented 11 months ago

fyi, these are the docs for the Save Page Now v2 API: https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit#heading=h.1gmodju1d6p0

My buddy did an example implementation in python here: https://github.com/palewire/savepagenow/pull/31

just1602 commented 11 months ago

Thanks for the reference @bensheldon we'll give that a look and try to update our current implementation. 😀