certspotter should be a daemon

AGWA commented 2 years ago

Running as a cron job is sub-optimal for several reasons:

It's not obvious what the interval between runs should be. A shorter interval allows for quicker notification of certificates, but if it's too short then certspotter will still be running when cron tries to invoke it again.
certspotter has to finish processing all logs before it can be run again. This means that if log A takes 1 minute to process, and log B takes 30 minutes to process, log A can be processed no more frequently than once every 30 minutes, which will delay the notification of certificates in that log.
Logs often have transient errors. When this happens, a CT monitor should continuously retry and only report the error if it persists for longer than the MMD. As a cron job, we can't do this because it would block the processing of other logs, so instead we immediately report the error (which contributes to error fatigue) and wait until the next run to try processing again. If certspotter were a daemon, we'd have more flexibility to retry and only report errors if they are persistent, not transient.

Most log errors would be suppressed, but we would want to raise an alarm if any of the following conditions arise:

We haven't been able to successfully fetch an STH from a log for longer than the log's MMD
A log has a large backlog of unprocessed certificates (the backlog is the difference between the log's latest STH and the current position)

The downside to running as a daemon is that we can't just print message (certificate details, error messages) to stdout/stderr and rely on the cron daemon to email them. So perhaps certspotter should invoke sendmail itself?

bllfr0g commented 2 years ago

IMHO daemons should log to syslog. If individuals care to have certain syslog messages emailed to them, that's a function of the syslog daemon.

AGWA commented 2 years ago

@bllfr0g certspotter produces multi-line formatted output which isn't appropriate for syslog. It's akin to daemons like smartmontools or mdadm, which directly send mail instead of using syslog.

paravoid commented 1 year ago

Note that in a systemd world, cron is becoming increasingly obsolete, and systemd does not have the equivalent MAILTO functionality. One has to implement that by hand (through OnFailure and some local script that calls sendmail etc.). So figuring out something better there would be great.

Calling sendmail directly sounds fine to me, but instead of that, it'd be vastly better if -script became a little more featureful (#14, #21 etc.), to give the flexibility to sysadmins to ingest these events in any way they prefer. You can ship an example hook that does call sendmail, but that can be implemented externally to certspotter.

I don't know if it's possible to encode all of the information in the environment somehow or if stdin should be used, or perhaps an intermediate state file to be given as an argument to the script. From a Debian standpoint, being able to use run-parts /etc/certspotter/hooks as the script would give a lot more flexibility to our users, and allow us to gain the ability to ship a better out-of-the-box experience. HTH!

paravoid commented 1 year ago

For completeness, certspotter 0.14.0-1, now in Debian unstable (and soon testing) and Ubuntu lunar now ships an /etc/certspotter/hooks.d where users can drop executable files to act on notifications. We don't pass -no_save, so CERT_FILENAME can be used to retrieve the raw certificate as well. In the next revision I'll work on an example hook script that emails using mail(1). I'll figure out a nice email body and all that.

I think we can make this a smooth transition to a daemon if and when that happens :)

AGWA commented 1 year ago

@paravoid Good point about not being able to rely on MAILTO functionality.

That said, I have some concerns about the /etc/certspotter/hooks.d approach.

First, -script is not ready for production use. The biggest problem is that if a certificate has a large number of SANs, the $DNS_NAMES variable can become so long that execing the script fails with E2BIG (this is possibly the cause of #32 although the error message reported there is not what I would expect). This is security relevant, since it allows an attacker to suppress certspotter notifications by including a bunch of SANs in a misissued certificate.

certspotter needs a way to convey information to scripts besides environment variables. My best idea so far is to feed JSON to stdin, but I'm open to other suggestions. (But note that I do not want scripts to be responsible for parsing certificates, since this is difficult to do robustly and certspotter already has a special parser that is built to withstand malicious input.)

Second, certspotter needs a way to report errors and log misbehavior. It's important that users be alerted that certspotter might not be detecting all their certificates, instead of failing silently. I'm worried that the approach in 0.14.0-1 makes it too easy for problems to go unnoticed.

I think it would be easy for me to add a -email option within the next couple weeks to have certspotter send emails, and we should guide users to that until -script is ready.

paravoid commented 1 year ago

Thank you for the feedback! Sorry to hear about -script not being production-ready and potentially unsafe. (I'd recommend adding a notice to that effect in a few places, including the new manpage ;). The Debian freeze is approaching, so we'll have to figure out a solution that is safe and useful relatively quickly, as to not be stuck with something you consider suboptimal this through the bookworm release. (I can always revert, but I think that kind of functionality can be useful).

I wholeheartedly agree with you on errors being reported (the Mammoth outage this past week was a good example!). I was looking forward to a solution to this, through perhaps this task :)

With regards to making -script safer,/better Perhaps a good compromise would be to offer less information about the certificate, either through an option, or conditionally through a _TRUNCATED=1 or something of that sort. After all, a downstream user can always parse the certificate in their own systems (or in a shell hook, with openssl x509) if they need to extract information from it.

Feeding input to stdin is OK I guess, but kinda scary; it precludes from passing it into multiple scripts/tools, for example, which I think can be useful functionality. Plus all the usual caveats around pipes, buffers etc. that you've noted yourself.

-email sounds OK, too but I'd argue that "just" emailing certificates may be a bit too simplistic for many use cases. First of all, it assumes that the primary medium for machine-generated communication is email, which is not always the case. Imagine logging, alerting and paging pipelines using APIs, or incident management systems etc. that take structured information rather than text emails.

Plus, I know of at least one user that ended up disabling certspotter because it was too "spammy". With dozens of end-host certificates issued through Let's Encrypt, all renewed every 90 days (as LE does), the number of TLD matches was just too high, and consequently the signal-to-noise, too low. Having the ability to take a fingerprint through a script, check it against an internal database of certificates that are known to be legitimate (e.g. through a centralized system), and only alerting on certificates that truly unknown would make a big difference in the usefulness of certspotter. And, at the same time, having the ability to create different pipelines -- one that logs every match to a local logfile, another one that logs an entry to a Logstash endpoint (even for well-known ones), another one that logs low-severity alerts on certain domains, another one that creates a (paging) security incident through an incident-management system on other higher-importance domains etc. These are the kind of the real-world use cases I envisioned with the hooks.d functionality. Hope this helps explain some of my thinking!

AGWA commented 1 year ago

Thanks, @paravoid. Those are good points and have helped me understand the full value of -script. My main concern with -script is making sure it's used safely. The nice thing about -email is that it gives certspotter complete control over what alerts are sent and what they say, whereas -script allows a lot of opportunities for mistakes, such as ignoring precertificates.

With that in mind, here's my new plan to support -script in a robust way that should hopefully minimize mistakes:

The script will be invoked for both discovered certificates and errors, with an environment variable distinguishing between these events.
In the case of certificate events, a limited amount of information will be provided in environment variables. No more $DNS_NAMES or $IP_ADDRESSES since there is no way to guarantee complete information. No $CERT_TYPE for the reason documented in https://github.com/SSLMate/certspotter/commit/cd2bb429fc2f4060a33ec8eb8b71a3eb12e9ba93. There will be a new variable called $WATCH_ITEM or similar which contains the domain from the watch list which matched. This can be used to route alerts based on the importance of the domain.
certspotter will write a JSON file, and maybe a text file too, alongside the .pem file containing a parsed representation of the certificate. The paths will be made available in environment variables. Scripts can read these files (e.g. with jq) to get more information, such as the DNS names, so that scripts don't have to do their own certificate parsing.
There will still be an -email option, and it will be documented as the safest and recommended way of getting notifications.

I've made good progress on refactoring certspotter to be a daemon, and expect to have a new version ready in the next week or two.

SSLMate / certspotter

certspotter should be a daemon #63