Analyze/Improve Redis fault tolerance

sopel commented 11 years ago

This has been triggered by #90 and relates to #88 - the Redis FAQ What happens if Redis runs out of memory? seems to suggest that the virtual memory based solution mentioned in What does Redis do when it runs out of memory? never materialized, so the question seems to be whether we can/should fundamentally do something about this at all or whether a more pragmatic approach would be to define sort of an internal SLA and respective alarms (see #91) and assume a non serviced queue will be detected and timely addressed by an operator in most cases.

Obviously any kind of automatic fault tolerance would be preferable, but the questions is where to strike a balance.

dpb587 commented 11 years ago

Test scenario... I used two MEPS logs (Log.log.20130607-15.log, 552.177 MB; Log.log.20130607-14.log, 600.956 MB) to push events into a non-draining redis queue. Redis was receiving them at about ~750k events/minute. Once complete, the append file was 2.85 GB (~147% inflation) and there were 9.3m events queued. I sent a KILL to the running redis-server. Upstart restarted the service and it took 43 seconds for it to reload the 9.3m events and be ready for processing again. I started up the queue processing for a bit. I then rebooted the instance via AWS Console. It took 56 seconds for it to reload the now 9.1m events and be ready for processing again.

This demonstrates...

If the queue grows extremely large, as long as there's sufficient disk space, it's not a problem and redis keeps up quite well.
If redis crashes hard, the data is persisted and reloads fairly quickly.
If the system is rebooted, the data is automatically reloaded on startup.

The test scenario was fairly extreme, both in terms of queue size (queue lag problems should normally be addressed quickly, rarely reaching 9m) and queue rate (a rate <750k events/minute is extremely high and normally the queue is actively draining).

sopel commented 11 years ago

Thanks much for analyzing/testing this, that's very impressive and promising - to clarify 1) regarding the issue encountered in #90: going forward Redis won't be limited by system memory/swap anymore, rather continuously write to AOF as detailed in Redis persistence according to the configuration you committed here, thus only be limited by the size of the disk storing that append file (with the additional benefit of surviving a Redis crash)?

sopel commented 11 years ago

:information_source: Given the fairly low HA requirements, this would also allow/suggest to facilitate a load balanced Redis tier with auto scaling EC2 spot instances for cost savings and improved fault tolerance instead, see #39 and #54.

dpb587 commented 11 years ago

Good thought... I'd like to re-test this on a memory-bound instance to be sure of expectations...

dpb587 commented 11 years ago

As background, from Redis FAQ in response to "What happens if Redis runs out of memory?":

With modern operating systems malloc() returning NULL is not common, usually the server will start swapping and Redis performances will degrade so you'll probably notice there is something wrong.

The INFO command will report the amount of memory Redis is using so you can write scripts that monitor your Redis servers checking for critical conditions.

Alternatively can use the "maxmemory" option in the config file to put a limit to the memory Redis can use. If this limit is reached Redis will start to reply with an error to write commands (but will continue to accept read-only commands), or you can configure it to evict keys when the max memory limit is reached in the case you are using Redis for caching.

I ran the same test from the prior comment on an m1.small instance (slower, 1.7 GB). In reality, the snapshot and append-only file forks when it's time to do an atomic write. As observed, when it runs out of memory, redis starts throwing:

...snip...
[21475] 04 Aug 22:27:54.329 * Starting automatic rewriting of AOF on 112% growth
[21475] 04 Aug 22:27:54.329 # Can't rewrite append only file in background: fork: Cannot allocate memory
[21475] 04 Aug 22:27:54.439 * Starting automatic rewriting of AOF on 112% growth
[21475] 04 Aug 22:27:54.439 # Can't rewrite append only file in background: fork: Cannot allocate memory
...snip...

But it continues writing to the append-only file (just wasn't able to rewrite it) and, as the FAQ suggested, redis indeed moved to using swap memory. At about 5.6m events, it had consumed all the available swap. Redis freaked out and crashed hard without an additional error. The service manager restarted it and redis started to reload from the AOF, and while reloading, unsurprisingly, it ran itself out of memory and crashed hard. Repeat, ad infinitum.

Granted, redis has always warned that it might run into problems if it has low memory:

[21475] 04 Aug 21:53:21.129 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

I'd avoided patching that thus far. Might as well see if it helps, so I reset the system, updated vm.overcommit_memory, and reran the test. Results were essentially the same.

Redis is an in-memory piece of software, so it's not really designed to work pulling from a disk and memory like a typical database. So. I think we need to:

ensure the broker machine has an amount of RAM which is greater than: (maximum SLA time to resolve non-draining queue × log messages/min × bytes/log message × 2%) where 2% represents data structure overhead that redis might have (unfounded)
ensure we're monitoring the queue to ensure it is actively draining so we can quickly resolve any issues and prevent data loss

Note, the m1.small instance was holding 5m+ MEPS events before it crashed with only 1.7 system RAM, so I don't think this limitation is a huge reason for concern.

@sopel, please review; close if this is satisfactory, or redefine the direction this issue should take.

mrdavidlaing commented 11 years ago

@dpb587 - great analysis. Filed for future reference:

Redis is an in-memory piece of software, so it's not really designed to work pulling from a disk and memory like a typical database

sopel commented 11 years ago

@dpb587 - thanks for the analysis/summary indeed; as you already suggested, this triggers two follow up improvements regarding the subject matter, which both are on file already one way or another:

monitoring and alarms regarding queue status, which is on file via #91
scaling the broker machine to handle the load, both manually once regarding the minimum according to your algorithm, which is implied in https://github.com/cityindex/logsearch-on-aws/issues/5, and eventually automatically in case the load turns out to be flaky and/or the SLA would increase, which is on file via #54, but has been demoted for the time being

cityindex-attic / logsearch

Analyze/Improve Redis fault tolerance #92