freedomofpress / securedrop

GitHub repository for the SecureDrop whistleblower platform. Do not submit tips here!
https://securedrop.org/
Other
3.6k stars 683 forks source link

Feature idea: per-instance exporting of Tor metrics for monitoring by Prometheus #4411

Open ageis opened 5 years ago

ageis commented 5 years ago

Description

Every SecureDrop runs at least two Tor daemons. From the perspective of the IT administrator at a news organization, it might be useful for them to have the ability to ingest statistics about Tor activity. I have some prior experience with setting up introspection and monitoring for a fleet of exit nodes, and I recommend https://github.com/atx/prometheus-tor_exporter. It is based on stem plus the prometheus_client PyPi library. This means it could be integrated directly into the SecureDrop web application, or run separately as a systemd service.

By an exporter what is meant is that metrics are published to an HTTP endpoint. This can then be scraped by Prometheus (website, github), where then Prometheus may be the backend for a powerful visualization platform like Grafana, or an alerting/notification solution like Alertmanager. One probably would not want this traffic information to be published publicly due to the risk of timing attacks and deanonymization, etc.. Well, Prometheus is able to scrape targets via Tor when combined with an HTTP proxy such as Polipo or Privoxy, so the metrics could be served over an authenticated Tor hidden service. Even just polling the page manually every once in a while, say, only from the internal network, can provide insight.

This would help FPF+admins keep an eye on issues in the Tor network, instances that may not be reachable or able to establish circuits, extremely low or high amounts of traffic, plus whether an instance has updated. You could also just put the service there and make it optional whether orgs want to turn it on. Or this could be a feature specific to "dev".

With the open source exporter I linked, you get the following information updated on a regular basis:

Name Description
tor_written_bytes Running total of written bytes.
tor_read_bytes Running total of read bytes.
tor_version{version="..."} Tor daemon version as a tag
tor_version_status={version_status="..."} Tor daemon version status as a tag
tor_network_liveness Network liveness (1.0 or 0.0)
tor_reachable{port="OR|DIR"} Reachability of the OR/DIR ports (1.0 or 0.0)
tor_circuit_established Indicates whether the daemon is capable of establishing circuits (1.0 or 0.0)
tor_dormant Indicates whether tor is currently active (1.0 or 0.0) (note that 1.0 means "dormant", see the specs for details)
tor_effective_rate Shows the effective rate of the relay
tor_effective_burst_rate Shows the effective burst rate of the relay
tor_fingerprint{fingerprint="..."} Node fingerprint as a tag
tor_nickname{nickname="..."} Node nickname as a tag
tor_flags{flag="Authority|BadExit|Exit|Fast|
Guard|HSDir|NoEdConsensus|Stable|
Running|Valid|V2Dir"}
Indicates whether the node has a certain flag (1.0 or 0.0)
tor_accounting_read_bytes Amount of bytes read in the current accounting period
tor_accounting_left_read_bytes Amount of read bytes left in the current accounting period
tor_accounting_read_limit_bytes Read byte limit in the current accounting period
tor_accounting_write_bytes Amount of bytes written in the current accounting period
tor_accounting_left_write_bytes Amount of write bytes left in the current accounting period
tor_accounting_write_limit_bytes Write byte limit in the current accounting period
tor_uptime Uptime of the tor process (in seconds)

If you look around at web applications and certain types of software now in wide-use generally, there are lots of projects these days which are exporting their own internal metrics for SRE concerns, based on the Prometheus client libraries being available for many different languages. Take Kubernetes for example. So if tracking Tor is not a priority then the SecureDrop application itself could have its own source-friendly internal metrics.

ninavizz commented 5 years ago

Could it be possible to track user decisions made in configuring their Tor browser—such as their Security Setting choice, or if they simply quit the Tor Browser or chose to make a New Identity at the end of a session? Also, could the total length of a Source user's session be tracked?

My guess on all of the above is "No," not w/o compromising anonymity or security things—but I figure it can't hurt to ask.

ageis commented 5 years ago

@ninavizz No, absolutely not. The only thing close to that are the bytes read and written by the Tor daemon. From what I understand this typically includes padding of sorts, so it is not perfect for fingerprinting.

ninavizz commented 5 years ago

Where would the value be, then, to Source Users, Newsroom staff, or FPF product/dev teams?

Would either Source users or Newsroom staff experience any value, or would the value be purely in flagging security hardening oppties? Not looking to provoke or dismiss, just honestly curious. :)

ageis commented 5 years ago

@ninavizz I thought I already made that case above. This team has been taking a professional approach to techops and software engineering, and this is considered best practice in providing monitoring (and debugging) facilities related to services the uptime, availability and smooth functioning of which is important. Downtime of the Tor hidden service reflects upon the news organization's business reputation. If they are a large org, the type with plenty of dedicated IT staff (like New York Times) then they will already have systems like Prometheus and Grafana in place which will be able to make use of the extra information. This has nothing to do with security hardening. Anyway stepping away from the ticket for now but I'll add more justification thoughts later.

Presuming somehow FPF chooses to ingest the metrics for themselves, then it also will allow subject matter experts such as them to better understand the functioning of the Tor network, traffic patterns, etc. The Prometheus query language ("PromQL for short") includes many mathematical functions that you can use to get a handle on the time series data, such as rates of change or peculiar spikes, or highlighting even tiny variations that would not otherwise be detected.

Based on this, one can create alerts if Tor is not running a certain version, if the Tor daemon crashed, to provide two possibilities.

ninavizz commented 5 years ago

@ageis Yes, you absolutely did outline the devops case in your OG comment. Only asked for further clarification, cuz I'm curious and like to learn (which carries little general urgency). Making SD more inline with (cough, modern) enterprise expectations of reliability & maintainability I agree, is important! :)