commoncrawl / news-crawl

News crawling with StormCrawler - stores content as WARC
Apache License 2.0
323 stars 35 forks source link

Have as many WARCBolt instances as there are workers #64

Closed jnioche closed 11 months ago

jnioche commented 11 months ago

See #63

All the bolts are set to have as many instances as there are workers, apart from the WARC Bolt. There is no risk of collision in the outputs as the instance number is used in the filename so there is no reason not to use more than 1 instance, especially given that it could save on serialization and sending heavy tuples across the cluster.