ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.35k stars 134 forks source link

Add queue and pending #103

Closed raspher closed 6 years ago

raspher commented 6 years ago

pending is not important as queue

maybe we can set max jobs of grab-site and add a list of websites to crawl after that jobs are finished?

sorry for bad english

ivan commented 6 years ago

I don't think this should be in grab-site, but it's possible to do with external queuing tools.

I have sometimes used a for loop, e.g.:

for i in site1 site2 site3; do grab-site $i; done

Or, if you want to dynamically queue things, I have used Task Spooler before to some success (in Debian/Ubuntu, package task-spooler, command tsp.)