edgi-govdata-archiving / edgi-website

💻 Public repo for issues and project management for EDGI's website
https://envirodatagov.org/
4 stars 2 forks source link

Submit all main webpage from site to archive.org #39

Closed patcon closed 7 years ago

patcon commented 7 years ago

Slack context: https://edgi.slack.com/archives/C3EF5T7T6/p1504222869000223

Many of our non-top-level pages are not in the wayback machine. Supposedly crawlers don't normally go deeper than front page, but we can manually submit

cc: @mhucka

patcon commented 7 years ago

Used Chrome plugin to submit pages from https://envirodatagov.org/sitemap_index.xml for "post", "page", and "publications"

patcon commented 7 years ago

A quick cruise around our wayback website seems to indicate we're good for now: https://web.archive.org/web/20170823022648/https://envirodatagov.org/

If we wanted, someone could prob write a script to auto-submit our full website from the sitemap.xml once a month or so, but not sure that's worth it