Closed cgueret closed 7 years ago
Given that neither has a cluster registry URI specified, they're both running with the static
cluster mechanism, which means there's no persistent storage of cluster information, and instances don't have any mechanism to interact with each other, thus the fact you're running two separate processes which happen to link with libcluster is a red herring.
Ooops sent the version with the cluster commented out. That's what we just did to work around the issue until we figure out where it comes from (https://github.com/bbcarchdev/acropolis/commit/d16f498b36340d76a0eedf9a791b3233cab58cb9). Crawl as the correct config though so it is indeed surely not an issue between the two as I initially thought. Did some more investigation yesterday and it could be a issue with the other twine instance we use to bridge with Anansi but putting those two on two different clusters still yield the same result of writerd seeing two instances. I will try some more today...
The problem appears to be that the value of the parameter "cluster-name" is not correctly picked up. Instead "twine" is always used and running one instance of twine to pick items up from Anansi, and another to process them using generate end up balancing the load among each other. This despite setting the former to a cluster "twine-anansi" and the latter to another cluster "twine-spindle" in their respective config files...
Rather than self-selecting load-balancing clustering, it'd be simpler to just let processors pick up items from a queue. Pros: much simpler code, can't "miss" items.
When using the two following configuration files: twine.txt crawl.txt A single writerd instance thinks there is two of it on the cluster and balances the load. Whereas the second instance is in fact Anansi.