Members are not correctly differenciated

bbcarchdev / libcluster

Clustering support library (originally part of anansi)

Apache License 2.0

0 stars 2 forks source link

Members are not correctly differenciated #20

Closed cgueret closed 7 years ago

cgueret commented 7 years ago

When using the two following configuration files: twine.txt crawl.txt A single writerd instance thinks there is two of it on the cluster and balances the load. Whereas the second instance is in fact Anansi.

nevali commented 7 years ago

Given that neither has a cluster registry URI specified, they're both running with the static cluster mechanism, which means there's no persistent storage of cluster information, and instances don't have any mechanism to interact with each other, thus the fact you're running two separate processes which happen to link with libcluster is a red herring.

cgueret commented 7 years ago

Ooops sent the version with the cluster commented out. That's what we just did to work around the issue until we figure out where it comes from (https://github.com/bbcarchdev/acropolis/commit/d16f498b36340d76a0eedf9a791b3233cab58cb9). Crawl as the correct config though so it is indeed surely not an issue between the two as I initially thought. Did some more investigation yesterday and it could be a issue with the other twine instance we use to bridge with Anansi but putting those two on two different clusters still yield the same result of writerd seeing two instances. I will try some more today...

cgueret commented 7 years ago

The problem appears to be that the value of the parameter "cluster-name" is not correctly picked up. Instead "twine" is always used and running one instance of twine to pick items up from Anansi, and another to process them using generate end up balancing the load among each other. This despite setting the former to a cluster "twine-anansi" and the latter to another cluster "twine-spindle" in their respective config files...

rjpwork commented 7 years ago

Rather than self-selecting load-balancing clustering, it'd be simpler to just let processors pick up items from a queue. Pros: much simpler code, can't "miss" items.

cgueret commented 7 years ago

Problem fixed with https://github.com/bbcarchdev/twine/commit/8e33544df074e68d90059fbb706ac918cda9c723