Closed dpb587 closed 10 years ago
With some thought and research, I think this is how we should approach it. First the core concepts:
zone
and schedule
), but I think our scaling operations will be more efficient with a single property with values of (daytime
, fulltime_euwest1a
, and fulltime_euwest1b
). Elasticsearch won't allocate a shard to multiple nodes with the same set of "awareness properties". I don't think we should care where the daytime
nodes are running, but the full time nodes should be mirrored across AZs.Putting those concepts together, we'll specify cluster.routing.allocation.awareness.attributes
as zone,schedule
and we'll inject those properties to each node. During non-business hours I think we should continue with our 2-replica setting (keeping data replicated across AZs). About an hour before our scale-up deadline, we can update the settings to specify we want 3 replicas and enable cluster.routing.allocation.disable_allocation
. Then we can start up the extra nodes; and once they're all online, we can disable cluster.routing.allocation.disable_allocation
to let them sync back up and chat about who gets to handle the third replica. At the end of business, we scale down by again disabling allocation, terminate the nodes, update replica settings down to 2, and re-enable allocation.
These procedures won't necessarily be lightweight operations, but it's something we're interested in at this time, so, we can at least give it some time and experimentation.
I haven't fully contemplated your proposal yet and this would only address the schedule
aspect, but incidentally AWS has just announced that CloudFormation supports Auto Scaling Scheduled Actions (long overdue actually, they exist for quite a while already):
AWS CloudFormation now supports Auto Scaling scheduled actions [...]
[...] With support for scheduled actions, you can now model Auto Scaling schedules in CloudFormation templates. If you have a predictable traffic pattern, you can scale Auto Scaling groups using scheduled actions. We have created a sample template to show you how.
Given my mind set and resp. preference to handle everything from within the available self contained automation layer, my thinking is once again towards inversion of control:
context
) would be exposed as CloudFormation parameters and everythingOrchestrating the allocation/replication via on instance scripts might be just what you had in mind though? Let's discuss this during the upcoming hangout ...
I've created a PR (#326) with the relevant code changes. Currently I consider the implementation to be primarily a "proof of concept" - I've tested it with smaller datasets.
Research and test different strategies for scaling during business hours while maintaining data.