AusDTO / disco_layer

Code, outputs and Information relevant to the discovery layer.
1 stars 5 forks source link

spool up a swarm of workers and process the information extraction backlog #67

Open monkeypants opened 9 years ago

monkeypants commented 9 years ago

per #30 and following #66

monkeypants commented 9 years ago
monkeypants commented 9 years ago

Throughput from 20 workers;

Hitting DB limits above ~40 workers, but that bottleneck could be widened by connection pooling or simple winding up the number of connections on the DB. It's not a big deal, this only has to run faster than the crawler.

Need to look into tuning the index (shards, clusters, etc), since that's the current limiting factor. It needs to run faster than WebDocuments -> model_resource!