buren / wayback_archiver

Ruby gem to send URLs to Wayback Machine
https://rubygems.org/gems/wayback_archiver
MIT License
57 stars 11 forks source link

Figure out what to do when a domain redirects to a subdomain when starting to crawl #15

Closed buren closed 2 years ago

bartman081523 commented 5 years ago

I think you closed this with #27 To archive the origin pages first, you could crawl all referenced ressources from one page even on a subdomain to archive a coherent complete page, but stay to the origin source of the referenced ressources and crawl from there on. What you are accomplishing with the --crawl strategy is sometimes anyway a complete domain archival.