freedomofpress / securedrop

GitHub repository for the SecureDrop whistleblower platform. Do not submit tips here!
https://securedrop.org/
Other
3.62k stars 686 forks source link

Greenfield Logging/Alerting discussion #2574

Open msheiny opened 6 years ago

msheiny commented 6 years ago

Container based OS/green-field -- logging/alerting discussion

Feature request

Description

Since we effectively get a clean slate here to re-design the logging and alerting story, let's first break-down what we are trying to collect and when we think admins should be alerted.

A big problem with the current OSSEC design is that alerts are NOT action-able and thereby easily dismissed/neglected. We need to keep that in mind that alerts should only be sent in a situation where we expect an admin to look at the info and make a quick asssessment of whether they need to take action. Many many discussions in github also indicate we need to move away from email (lots of "hey lets move to signal" - like #1124 )

Might want to break this ticket up when implementation time comes to further discuss specific issues, but here we go...

Metric data we want to collect:

Basic features we want:

Nice to have (optional/reach-goals):

When admin should be alerted:

User Stories

As a securedrop administrator I would like sane action-able alerts

ageis commented 5 years ago

This ticket appears crazily ambitious, and I don't know if it's current. It sounds like you want a full-service DevOps/SRE solution.

The features in alerting that you were seeking are all covered by Prometheus Alertmanager. Alerts can be grouped, filtered, they're written with easy to understand conditional logic, etc. e.g.

  - alert: SecureDropInstanceDown
    expr: probe_http_status_code != 200
    for: 10m
    labels:
      severity: critical
    annotations:
      description: '{{ $labels.instance }} at {{ $labels.address }} might be down'
      summary: '{{ $labels.instance }} is not returning HTTP 200/OK'

Addressing a couple of other things you mentioned...

For monitoring the Tor network, I highly recommend: https://github.com/atx/prometheus-tor_exporter which I've used at Calyx.

For monitoring containers, see Cadvisor, or secondarily a product like Sysdig.

With regard to aggregated statistics about uploads, I did have that in https://github.com/freedomofpress/securedrop/pull/4414 which was understandably declined.