Improve Splunk Storm throughput

Saw a memory issue in the normalizer, looks like the normalizer just got backed up until it died. It looks to be that our Splunk throughput is extremely variable. Here are some thoughts on improving it.

[x] Increase batch size to 1000 messages.

Splunk docs say that if you're sending more than a few MB, then use the forwarder. Our average request size of 1000 messages is about 1MB, which should be fine for them.

[x] Throttle requests to 1 request per second.

We've seen Splunk handle many more but 1 is good enough for us. Spunk response time will be sub 100 ms for serial requests and sometimes 30 seconds. Having a throttle just makes us a more predictable client. As far as I can tell, they're not throttling us. Also, 503 is a bad code to say "backoff".

[x] Set a request time out of 90 seconds.

I've seen gateway timeouts (504) occur at 60 seconds. We would prefer to wait for Splunk to return a timeout error, but we don't want to be in the case where we hang indefinitely. Even setting this to 10 minutes would be good.

[x] Set a minimum on the batch size.

I've seen good performance when slowing down when we're under our batch size. Basically looks like this:

if messages.length < MAX_BATCH_SiZE
  sleep(5000)

This allows our queue to fill up a bit more (we average 30 messages/sec) but makes sure even at low volume we're still pushing messages to Splunk.

goodeggs / heroku-log-normalizer

Improve Splunk Storm throughput #3