18F / pulse

How the federal .gov domain space is doing at best practices and policies.
Other
94 stars 56 forks source link

Deprecated sub-sub-domains below a wildcard display as non-compliant #760

Open PaulSD opened 6 years ago

PaulSD commented 6 years ago

search.cloud.cio.gov was previously an active domain, but it has been decommissioned. However, since it shows up in the "End of Term Web Archive", Pulse still scans it. cio.gov currently uses a *.cio.gov wildcard DNS record and certificate. Due to the differences in wildcard handling between DNS and certificates, search.cloud.cio.gov still resolves, but HTTPS requests to it result in a certificate error. This causes pshtt to fail, which causes this domain to incorrectly show up as non-compliant in Pulse.

Any ideas on how to fix this?

konklone commented 6 years ago

Usually, decommissioned sites take care of themselves by no longer responding to HTTP. But yeah, this edge-case scenario -- decommissioned 4th-level subdomains tied to 3rd-level wildcard DNS -- has also come up with other agencies, including my own office (some old cloud.gov subdomains).

In the cloud.gov case, the hostnames come from old (now-expired) certificates found in Censys.io. So my thinking had been to look at excluding expired certificates from Censys results, but this obviously wouldn't do anything for hostnames in EOT, so we'd want to find a more general solution.

I feel like we need to do something here, since agencies should obviously be able to use wildcard DNS, and should clearly not be obliged to keep special-case logic/certs around indefinitely for decommissioned sites.

But so far we (speaking for at least GSA) have never had to introduce any blacklisting logic for hostnames at all so far, and doing so creates a gap in oversight if those names ever do become used again.

Can anyone think of other options? Ways to detect this particular kind of edge case programmatically, or ways we could change how we consume upstream hostname sources?

cc @h-m-f-t @jsf9k for their opinions, since DHS NCATS is in the same position Pulse is in. And cc @brittag @mogul since they own the similarly affected cloud.gov hostnames.

konklone commented 6 years ago

On the subject of upstream data sources -- we do want to avoid using data that is too stale, but it's not always easy. Our current sources:

PaulSD commented 6 years ago

I think pshtt needs to be updated to continue processing its checks after it encounters a certificate validation error.

After that ... As a practical matter, sites which use HSTS and have an invalid certificate will be inaccessible to users, so if a site is otherwise complaint, does it really matter if it has an invalid certificate?

Pulse could assume that compliant sites with invalid certificates are irrelevant and simply ignore them. Or it could continue to report on their compliance but not report the certificate errors. Or it could be updated to split the certificate errors out into a separate field and report them separately from other issues.

Or you could take it another step further and do some analysis on the certificate validation error to determine whether the site should be reported on or not. For example, a certificate that is otherwise valid but was issued by an unknown CA is probably still relevant and may not need any special indicators in Pulse (might just be a private CA that is distributed to clients independently), while a certificate that is expired should probably show up in pulse with a warning, and a certificate that is otherwise valid but has a name mismatch (especially if the current cert is a wildcard on a parent domain) probably indicates a stale domain that is no longer relevant and could be ignored.

h-m-f-t commented 6 years ago

As a practical matter, sites which use HSTS and have an invalid certificate will be inaccessible to users

"Invalid" is dependent upon client root stores, and there's nothing (no policy mandate, anyway) that says federal sites must use a public root-- only that doing so "may be practical for web services whose users can be consistently expected to trust the issuing federal certificate authority". That may be exactly what an agency is going for in, say, using a private root for an internet-facing agency TLS VPN page.

While it's a blunt instrument, @PaulSD's situation is exactly what pushed OPM to preload several of their domains.

konklone commented 6 years ago

I think pshtt needs to be updated to either ignore certificate validation errors, or separate the reporting of cert validation errors from all other reporting.

Certificate validation is a part of validating that a secure connection is possible, and distinguishing between types of certificate validation errors is a core part of the logic.

In particular, as Cameron notes, we don't enforce the use of any particular CAs, so validation errors related to the use of a self-signed certificate or an untrusted chain can still get an agency a "Yes" on enforcing HTTPS, but the cert not being valid for the given hostname or being expired will give the agency a "No".

The rationale behind this is that what's important is that the service is supporting HTTPS connections. We can't measure whether or not the CA an agency uses is acceptable to their client base -- and even if the agency is using a CA not universally supported by their client base, the fact that they obtained a certificate specifically valid for that hostname indicates that HTTPS is a supported connection mechanism for that service.

A certificate that is not valid for the hostname, on the other hand, typically represents that port 443 is basically accidentally enabled, and/or that the client is getting a certificate intended for some other service supported on the same IP address or infrastructure. (The classic example here is an Akamai-hosted service where the customer hasn't paid for custom domain HTTPS support, and so the cert is only valid for an Akamai-owned shared hostname, but there are lots of smaller examples.)

Expired certificates are a bit of a middle ground, in that they represent that HTTPS at least was supported, but could also potentially represent a decommissioned service. But typically, it's a reasonable ask for the agency to remove decommissioned hostnames from DNS or to close ports 80/443.

This wildcard case -- decommissioned 4th-or-higher-level hostnames supported by a 3rd-level wildcard DNS record -- is an exception to that, where it's not reasonable to ask the agency to explicitly decommission the hostname or port in some way.

So I want to resolve this, but not in a way that changes fundamental assumptions about policy scope and enforcement.

After that ... As a practical matter, sites which use HSTS and have an invalid certificate will be inaccessible to users, so if a site is otherwise complaint, does it really matter if it has an invalid certificate?

Yes, it does -- one reason is that HSTS is not universally supported by all clients, and two is that even for HSTS-supporting clients, the HSTS policy won't be respected if it's not delivered over a connection the user trusts. A preloaded HSTS policy would still produce this behavior, but relying on preloading is still a mitigation, and not something that can justify a lack of valid server-side certificate configuration.

...Or you could take it another step further and do some analysis on the certificate validation error to determine whether the site should be reported on or not. For example, a certificate that is otherwise valid but was issued by an unknown CA is probably still relevant and may not need any special indicators in Pulse, while a certificate that is expired should probably show up in pulse with a warning, and a certificate that is otherwise valid but has a name mismatch (especially if the current cert is a wildcard on a parent domain) probably indicates a stale domain that is no longer relevant and could be ignored.

In general, our (GSA's, at least) experience is that certificate name mismatches almost always point to a mistaken configuration, or to HTTPS not being supported yet while the site remains available over plain HTTP. Decommissioning a site can almost always be reasonably accomplished and reliably detected by removing the DNS name or by disabling ports 80 and 443.

PaulSD commented 6 years ago

Ok, fair enough. In that case, would it be enough to simply look for a wildcarded parent domain in the certificate after encountering a name mismatch, and assume the domain is stale if one is found?

konklone commented 6 years ago

That's an interesting idea.

It would be technically a bit arduous, as right now we have no code that examines the certificate properties directly, we just look at what error code/message the validation process gives back, but not impossible. It might be a reasonable carveout for this kind of case, and may be less technical work than overhauling our upstream hostname gathering process.

@h-m-f-t @jsf9k @brittag @mogul Thoughts?