freedomofpress / pressfreedomtracker.us

Code for the U.S. Press Freedom Tracker project website
https://pressfreedomtracker.us
GNU Affero General Public License v3.0
17 stars 7 forks source link

Django JSON logging drops logs #550

Open conorsch opened 7 years ago

conorsch commented 7 years ago

Overview

The current prod settings use django-json-logging to log in JSON format, which vastly simplifies the log aggregation and alerting story. Unfortunately, it appears we're not getting the entirety of logs generated by the app. For one, I've never seen a "Purging URL" message for cache-busting (see #547). I believe this is because the django-json-logging module omits log events emitted by various parts of the app. From the docs:

A Django library that logs request, response and exception details in a JSON document.

That seems oddly specific to me, and may explain why the "Purging URL" events never land in the logs. We should consider switching to a different solution and refactoring the config accordingly.

Potential solutions

Edit prod config settings for current implementation

The django-json-logging docs mention an intriguing option:

DISABLE_EXISTING_LOGGERS = True - Set this to False if you want to combine with multiple loggers.

Unfortunately that's it in terms of docs on that feature, so good luck figuring out what it means. I haven't tried setting to False yet, since that may break log aggregation if JSON and non-JSON events are mixed together.

Switch to a different library

The django-log-formatter-json library looks promising, but there's a confusing distribution story: the version on pip is published by someone who isn't the project maintainer:

conorsch commented 7 years ago

I haven't tried setting to False yet, since that may break log aggregation if JSON and non-JSON events are mixed together.

Worth considering that even if the False setting mixes log formats, we could still munge in logstash and store everything we need. It would certainly be preferable to have all logs in JSON, which is what we thought we had with the current JSON logging setup, but that's not true. So even a mixed approach may allow us to iterate faster.

So in logstash:

  1. Try to parse django logs as JSON.
  2. If successful, hurray! Skip to 4.
  3. If unsuccessful, try to parse with custom regex.
  4. Store parsed event.