cityindex-attic / logsearch

[unmaintained] A development environment for ELK
Apache License 2.0
24 stars 8 forks source link

Research Scheduled Scaling Strategies #313

Closed dpb587 closed 10 years ago

dpb587 commented 10 years ago

Research and test different strategies for scaling during business hours while maintaining data.

dpb587 commented 10 years ago

With some thought and research, I think this is how we should approach it. First the core concepts:

Putting those concepts together, we'll specify cluster.routing.allocation.awareness.attributes as zone,schedule and we'll inject those properties to each node. During non-business hours I think we should continue with our 2-replica setting (keeping data replicated across AZs). About an hour before our scale-up deadline, we can update the settings to specify we want 3 replicas and enable cluster.routing.allocation.disable_allocation. Then we can start up the extra nodes; and once they're all online, we can disable cluster.routing.allocation.disable_allocation to let them sync back up and chat about who gets to handle the third replica. At the end of business, we scale down by again disabling allocation, terminate the nodes, update replica settings down to 2, and re-enable allocation.

These procedures won't necessarily be lightweight operations, but it's something we're interested in at this time, so, we can at least give it some time and experimentation.

sopel commented 10 years ago

I haven't fully contemplated your proposal yet and this would only address the schedule aspect, but incidentally AWS has just announced that CloudFormation supports Auto Scaling Scheduled Actions (long overdue actually, they exist for quite a while already):

AWS CloudFormation now supports Auto Scaling scheduled actions [...]

[...] With support for scheduled actions, you can now model Auto Scaling schedules in CloudFormation templates. If you have a predictable traffic pattern, you can scale Auto Scaling groups using scheduled actions. We have created a sample template to show you how.

Given my mind set and resp. preference to handle everything from within the available self contained automation layer, my thinking is once again towards inversion of control:

Orchestrating the allocation/replication via on instance scripts might be just what you had in mind though? Let's discuss this during the upcoming hangout ...

dpb587 commented 10 years ago

I've created a PR (#326) with the relevant code changes. Currently I consider the implementation to be primarily a "proof of concept" - I've tested it with smaller datasets.