Netflix / dynomite-manager

A sidecar to manage Dynomite clusters
https://github.com/Netflix/dynomite
Apache License 2.0
91 stars 59 forks source link

Dynomite Manager with Conductor #80

Closed kpdude7 closed 7 years ago

kpdude7 commented 7 years ago

(Note: I posted this same issue to the Conductor forum.) Does anyone out there have any experience deploying Dynomite Manager with Conductor on AWS? We have a topology we think works: One Conductor ELB with two 1.8.0 instances on different AZs, three 0.5.9 Dynomite instances per each of these AZs (for a total of 6), with a Redis 3.2.10 server running on each instance. Each Dynomite config file points to the other 5 EC2 ip's; and the Conductor config points to all 6. While this works great from a functionality perspective, from an OPS perspective it's a nightmare; if any Dynomite instance goes down, not only will they have to stand up an instance manually (since the config file has to point to the other running instances), they also have to restart the running instances (because their individual conf files also will have to be modified). We are currently researching Dynomite Manager to handle issues like this, but we can't find any specific documentation addressing how to get Dynomite Manager working with Conductor. From our understanding, the setting workflow.dynomite.cluster.hosts has to point to individual EC2 ip's, not some manager "host". Can someone point me to some documentation that addresses this type of deployment strategy?

ipapapa commented 7 years ago

Let me explain the pieces so that you can a better understanding. Redis is the storage layer, Dynomite is the proxy above Redis that replicates the data across multiple other Dynomite nodes. Each node has Dynomite, Redis, and Dynomite-manager (at least in Netflix). Dynomite-manager is used such that Dynomite can be well integrated with AWS and other infra.

Conductor is a workflow engine. It is an application. Conductor uses Dynomite/Redis as the storage layer to write the data to and read the data from. Conductor leverages Eureka to identify the Dynomite cluster (not Dynomite-manager).

The configuration files in Dynomite are only read during the start of Dynomite. Hence even if the YAML change, it does not matter. Dynomite-manager, however, does a good job in updated the YAML files every 30 seconds such that in case of a Dynomite process getting killed, when you restart it then you can have the latest configuration (and no need to wait for another 30 seconds to get the most recent update). In fact, even if you had a bad configuration in terms of the topology Dynomite-manager would fix that by sending the correct topology to Dynomite.

When a node dies, then we leverage AWS autoscaling groups (we deploy each AZ in each ASG - that 1-1 mapping is very important). Hence when a node dies it does not matter. It is a tree in the forest. A new node will come up and Dynomite-manager makes sure that the new node gets the data from another replica. Hence there is no OPS overhead. Everything should be fully-automated.

ipapapa commented 7 years ago

I am closing this due to inactivity. If you have any more questions please feel free to reopen it or open a new issue.