commoncrawl / news-crawl

News crawling with StormCrawler - stores content as WARC
Apache License 2.0
316 stars 34 forks source link

Have as many WARCBolt instances as there are workers #64

Closed jnioche closed 8 months ago

jnioche commented 8 months ago

See #63

All the bolts are set to have as many instances as there are workers, apart from the WARC Bolt. There is no risk of collision in the outputs as the instance number is used in the filename so there is no reason not to use more than 1 instance, especially given that it could save on serialization and sending heavy tuples across the cluster.