goodeggs / bites

Technical morsels
http://bites.goodeggs.com
8 stars 4 forks source link

JSON logging on Heroku #20

Open asalant opened 10 years ago

asalant commented 10 years ago

JSON logging is cool, Splunk, bunyan and other tools make it cooler. But Heroku splits long log lines producing invalid JSON log lines. It would be great if Heroku fixed/changed this but in the meantime @bobzoller built a service that reassembles Heroku JSON log lines so that you can enjoy JSON logs from your Heroku apps.

gasi commented 9 years ago

@asalant @bobzoller I am interested to log JSON data from Heroku to Splunk but can’t get Splunk to recognize it as JSON since Heroku includes a prefix of key value pairs with timestamp, etc., that is not valid JSON. How do you work around that and have Splunk recognize JSON from Heroku and visually display it as follows?

gasi commented 9 years ago

Issue described here: http://forum.loggly.com/discussion/391/logging-json-from-heroku

bobzoller commented 9 years ago

I just made our heroku-log-normalizer a public repo. There are no docs there, so I'll explain a bit here:

log line splitting

We had an issue for a long time where log lines would get split across multiple syslog messages coming out of Heroku's Logplex. This is most likely just a known limitation of syslog and not their fault, however it really complicates things when you're logging JSON. HTTP logdrains do not seem to be affected by this same issue... arbitrarily long log lines are transmitted as one, as expected.

"raw" JSON

Since Logplex wraps everything with syslog, you cannot simply ship your logs to Sumo (or another solution) and get native JSON parsing. Instead, heroku-log-normalizer transforms the syslog metadata into JSON, merges it with your application's JSON log line, and ships off raw JSON to Sumo.

platform logs aren't JSON

The next problem is Heroku platform logs, or really, any log line that isn't JSON. heroku-log-normalizer tries to JSON parse loglines, but if it fails, the entire line is sent as the msg property of a JSON object instead (along with the syslog metadata).

global metadata

We include all sorts of identifying metadata in our application JSON logs -- like APP_NAME and APP_INSTANCE for example -- but when a Heroku platform log line comes through we get none of it. The syslog metadata provides some (heroku app name, etc), but we need more to properly attribute the line to an app and environment. To solve this, heroku-log-normalizer will merge in any HTTP query parameters passed to the drain endpoint.

putting it all together

  1. Deploy heroku-log-normalizer somewhere. I believe it can be deployed to Heroku as it stands, but there's no Heroku button or anything.
  2. heroku drains:add https://heroku-log-normalizer.yourdomain.com/drain?name=myapp&appInstance=production

a platform log will arrive looking like:

{"timestamp":"2015-09-24T16:34:07.699Z","msg":"at=info method=GET path=\"/\" host=corp-myapp.herokuapp.com request_id=deaa77fa-c9c8-4eed-a8d2-867ee4e85317 fwd=\"107.127.127.127,199.127.127.127\" dyno=web.2 connect=2ms service=39ms status=200 bytes=745586\n","syslog":{"prival":158,"facilityID":19,"severityID":6,"facility":"local3","severity":"info","type":"RFC5424","time":"2015-09-24T16:34:06.746Z","host":"host","appName":"heroku","pid":"router","msgID":null},"name":"myapp","appInstance":"production"}

and your application log will arrive looking like:

{"timestamp":"2015-09-24T16:34:06.752Z","name":"myapp","appInstance":"production","env":"production","pid":3,"hostname":"d14bc717-0e0c-4ad0-b07c-5e8a60a3bd06","custom":{"whatever":"you know"},"requestId":"deaa77fa-c9c8-4eed-a8d2-867ee4e85317","level":30,"res":{"status":200,"contentLength":745068},"responseTime":35,"msg":"GET / HTTP/1.1","v":0,"syslog":{"prival":190,"facilityID":23,"severityID":6,"facility":"local7","severity":"info","type":"RFC5424","time":"2015-09-24T16:34:06.758Z","host":"host","appName":"app","pid":"web.2","msgID":null}}

Hope that's helpful!

gasi commented 9 years ago

@bobzoller Wow, this is great! Thanks for explaining and sharing your code :smile:

/cc @aseemk @ryanwe @chanadian @davestern @matthew-cox

aseemk commented 9 years ago

@bobzoller: this is incredible! Both the code and writeup. Thank you!

I was surprised to hear that the log line splitting isn't an issue for HTTP drains. That's awesome if so, but just want to double-check: are you sure your change didn't coincide with Heroku raising that limit from 1k bytes to 10k bytes?

https://devcenter.heroku.com/changelog-items/448

Thanks again very much. We're excited by this; just need to find some time now to try it out. =)

I'm going to go ahead file the first feature request now =): parsing logfmt-style logs too, e.g. from Heroku's router. We rely on that data heavily, and it'll be good not to lose that structure.

P.S. Thanks @gasi for discovering, asking, and mentioning!