goodeggs / heroku-log-normalizer

Normalize log lines from Heroku Logplex to JSON.
1 stars 0 forks source link

Improve Splunk Storm throughput #3

Closed adborden closed 9 years ago

adborden commented 9 years ago

Saw a memory issue in the normalizer, looks like the normalizer just got backed up until it died. It looks to be that our Splunk throughput is extremely variable. Here are some thoughts on improving it.

Splunk docs say that if you're sending more than a few MB, then use the forwarder. Our average request size of 1000 messages is about 1MB, which should be fine for them.

We've seen Splunk handle many more but 1 is good enough for us. Spunk response time will be sub 100 ms for serial requests and sometimes 30 seconds. Having a throttle just makes us a more predictable client. As far as I can tell, they're not throttling us. Also, 503 is a bad code to say "backoff".

I've seen gateway timeouts (504) occur at 60 seconds. We would prefer to wait for Splunk to return a timeout error, but we don't want to be in the case where we hang indefinitely. Even setting this to 10 minutes would be good.

I've seen good performance when slowing down when we're under our batch size. Basically looks like this:

if messages.length < MAX_BATCH_SiZE
  sleep(5000)

This allows our queue to fill up a bit more (we average 30 messages/sec) but makes sure even at low volume we're still pushing messages to Splunk.