ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.35k stars 134 forks source link

Enhancement idea: delta mode #54

Open ethus3h opened 8 years ago

ethus3h commented 8 years ago

From ArchiveBot issue https://github.com/ArchiveTeam/ArchiveBot/issues/169 This would be helpful to have in grab-site, too, e.g. for sites with many large files a few of which change regularly — it would be nice to be able to supply a CDX of a previous crawl, and skip things that were already gotten....