Analyze/Implement Auto Scaling for remaining tiers

sopel commented 11 years ago

This is an initial umbrella issue (i.e. should probably split into multiple issues once a strategy is in place) - Auto Scaling and EC2 Spot Instance usages has been seeded via #39 and #148, which addressed the most obvious, but also most simple tier.

To recap, there are three main goals with Auto Scaling:

improve availability via health checks ('keep at N' ), i.e. simply ensure that the running instances (can only be a single one too) are healthy and replace them with a new one automatically if not - in addition this eases vertical scaling (which works w/o Auto Scaling already though)
improve performance via horizontal scaling, i.e. scale up and down on schedule or on demand (be it automatically or by simple adjusting the Auto Scaling capacity manually)
decrease cost via EC2 Spot Instance usage, which is only available within CloudFormation via Auto Scaling

All three apply to all our tiers one way or another, but in quite different ways and priorities, which need to be analyzed accordingly.

:exclamation: a notable complication in comparison to the logstash tier is the current reliance on a dedicated DNS configuration, which likely needs to be replaced by a load balancing approach (be it via an actual load balancer or via DNS )!

sopel commented 11 years ago

Moved to Icebox due to low priority and conflicting schedules over the next 2 month.

sopel commented 11 years ago

:exclamation: While AWS has just added Redis support to ElastiCache (see #169), it neither supports Cache Node Auto Discovery) nor adding/removing nodes to a cluster yet, see Adding or Removing Cache Nodes:

Note At this time, you can only add or remove cache nodes from cache clusters running Memcached.

So this only addresses resiliency at this point, but doesn't help with (auto) scaling that much, insofar one needs to replace the entire cluster in case (still accessible via Creating a Redis Snapshot and Seeding a New Cache Cluster With a Redis Snapshot in turn though).

sopel commented 10 years ago

Unfrozen due to increasing demand, thus desire to further improve resiliency and reduce cost.

sopel commented 10 years ago

This has been a topic of https://github.com/cityindex/logsearch-config/issues/56 - here are some challenges and paraphrased quotes from the discussion as a foundation for extracting further dedicated issues (please correct/amend as you see fit) :

Challenges

@dpb587 mentions the challenge of handling fixed IP addresses with Auto Scaling (currently used/required for inbound log shipping, BTF access and in cluster communication)
@sopel adds that the similar issue of handling fixed EBS volumes (currently used/required for Redis/Elasticsearch persistence)

Discussion

@mrdavidlaing suggests that a move to Amazon VPC would enable use of Elastic Network Interfaces (ENI) for hot swapping instances
- @dpb587 notes that this doesn't play well (resp. at all) with Auto Scaling
- => VPC will be handled via #268
@mrdavidlaing suggests to evaluate/facilitate BOSH to orchestrate scaling and DNS due to resp. built in features
- @dpb587 asks whether BOSH supports Auto Scaling and/or EC2 spot instances, which according to @mrdavidlaing it does not
- => BOSH will be evaluated via #267
@sopel mentions that a known number of IP addresses can be handled via Auto Scaling regardless as done for the Visual Studio Load Test Cluster for example
@mrdavidlaing inflects that ideally BOSH should drive CloudFormation on AWS in order to retain the AWS specific infrastructure provisioning power of the latter, yet gain, the service provisioning and orchestrating power of the former

sopel commented 10 years ago

We can continue the generic discussion here, but I've extracted the various tiers to separate issues for separation of concerns:

Queue (Redis) => extracted to #269
:white_check_mark: Parser (logstash) => already implemented via #39 and #148
Persistence/Analytics/Search (Elasticsearch) => extracted to #270
UI (Kibana) => extracted to #271

sopel commented 10 years ago

Closed as Incomplete in favor of the extracted issues (see preceding comment).

cityindex-attic / logsearch

Analyze/Implement Auto Scaling for remaining tiers #149

Challenges

Discussion