edgi-govdata-archiving / guides

Technical guides for how to preserve and hold data
https://edgi-govdata-archiving.github.io/guides/
2 stars 0 forks source link

Understanding the Internet Archive Webcrawler (MOVE) #15

Closed dcwalk closed 7 years ago

dcwalk commented 7 years ago

Goal for Guide

MOVE existing "Understanding What the Internet Archive Webcrawler Does" from Event Toolkit

Description of content

"In this document we explain a little bit about what Heritrix can do, why it needs our help, and also how to identify documents and datasets that Heritrix can’t reach. "

Resources

dcwalk commented 7 years ago

Draft revision here! https://hackmd.io/EYRgZiBM6QtAnADgMxwCwBMwDZbAKbwCssyRRaRAhohtvAMbxA==?both

@sharlychan -- it would be wonderful to get some feedback on the above! We are moving some of the guides into more of a 'website' format. If you are interested, assign yourself to this issue!

dcwalk commented 7 years ago

@titaniumbones a review would be nice!

dcwalk commented 7 years ago

Merged in, closing!