RFC: Assign roles to nodes to control distribution

gofoss commented 6 years ago

Not sure if this belongs here or fits more to libcluster functionality, but would be great if one could assign roles to nodes, and then start worker processes only on nodes of particular role.

Thanks!

bitwalker commented 6 years ago

This is a great idea, though it would require some significant changes. I'll definitely consider it if there is enough feedback in favor of it.

kelostrada commented 6 years ago

I agree, I had a case where I wanted to have two types of nodes - web and engine. I wanted to start tasks on engine nodes but start them using web nodes (as they were usually started by some calls from controllers). To do that I had to add some custom routing calls to the nodes and then call the Swarm functions directly on some of the engine nodes. It would be great to be able to call swarm from those "blacklisted" nodes too.

bitwalker commented 6 years ago

Swarm will never support calling functions from blacklisted nodes (since that feature is used to ensure Swarm doesn't attempt to work with those nodes at all), but if by blacklisted you mean nodes of the web role being able to start processes on nodes of engine role (using your example), that should certainly be doable if roles are supported - starting a process on a node of a particular role would likely just be a case of specifying the role that process belongs to.

My intent would be to use Kubernetes terminology for this, so rather than "roles", nodes and processes would support "labels", and the combination of labels applied would yield the subset of nodes that are allowed to host a given process.

ghost commented 6 years ago

I have a possibly related need, to create groups of processes, and ensure that each process lives on separate node in a different availability zone - would this work make such a thing possible - or should I raise it elsewhere?

bitwalker commented 6 years ago

Labels would certainly be a solution to that problem - as it stands I haven't done testing with Swarm involving nodes which are geographically split across regions, so I would definitely be interested to hear of any experiences there, and whether any changes are required to handle that setup better.

kelostrada commented 6 years ago

if by blacklisted you mean nodes of the web role being able to start processes on nodes of engine role

Yeah, that's what I meant, I just mentioned it as one more vote up for this feature ;)

pragdave commented 6 years ago

Somewhat late to the party, but 👍 from me.

I'm looking to have an assembly of components, spread across nodes and servers. Some servers may be optimized for database work, others for networking, and so on.

I'd like to be able to create node_type labels for nodes, and worker_type labels for processes, and then to say "worker db_reader runs on database_nodes" etc.

Bonus points for allowing many to many associations :)

hickscorp commented 6 years ago

I'm copying my comment on another issue as it might be more relevant here:

@beardedeagle I really like the idea of roles, but I think this "tagging" could be more simply done if using libring's configuration.

I would suggest that register_name would accept a ring option. If the ring exists in libring's config, it must be used, otherwise all the nodes would be used.

arjan commented 6 years ago

@hickscorp this is a great idea, I think this could definitely work. IMHO, using rings would be the simplest implementation of a "node roles" concept.

gofoss commented 6 years ago

that looks elegant, I am just wondering how that correlates with libcluster's strategies, like EC2 or GCE with the tagged nodes discovery.

arjan commented 6 years ago

Exactly... I am studying the code right now.

libcluster is somewhat different from this, I think; libcluster only ensures that the cluster is fully connected.

libring can automatically fill its rings with discovered nodes (monitor_nodes: true); but it seems that swarm only adds nodes to its (currently single) ring when the swarm application was started.

Swarm's Strategy code currently looks a bit like it serves multiple purposes:

ring management (add / remove node), basically proxying to libring
key_to_node quorum decision making

arjan commented 6 years ago

Another take on this might be to create multiple registries. Right now, Swarm is a single, distributed registry; but, analogous to the elixir builtin registry (which is not distributed); instead of using roles / labels or multiple rings, we could start multiple Swarm registry instances.

zachdaniel commented 6 years ago

This may actually relate to the issue I just posted: https://github.com/bitwalker/swarm/issues/95, which illustrates a challenge I'm having running separate applications using swarm. Having multiple swarm's side by side, or being able to add rules/labels/roles to ensure that certain components are only run on nodes that are running their application would be very useful.

bitwalker / swarm

RFC: Assign roles to nodes to control distribution #58