GSA / https

The HTTPS-Only Standard for federal domains (M-15-13), and implementation guidance.
https://https.cio.gov
Other
248 stars 87 forks source link

Address potential narrow exception for externally resolved DTDs and XML entities #222

Closed konklone closed 7 years ago

konklone commented 7 years ago

This pull request attempts to resolve #215 and provide a narrow exception for agencies hosting DTDs and externally resolved XML entities over http:// URIs, for which redirects to HTTPS would unavoidably break clients.

My suggestion here is to allow them to keep serving over HTTP, while also adding HTTPS and HSTS support (on the assumption that these problematic clients don't support HSTS). It also asks agencies to notify their user/client community that the content is available over HTTPS and to strongly encourage them to use those URIs.

@elucify, could you please review the language in this PR and suggest changes as necessary to be more accurate in how I'm describing the issue?

I would also like to use a live DTD or XML example if at all possible -- @elucify, is it possible to provide a real world example for each case?

Fixes #215.

nacin commented 7 years ago

Really wish people wouldn't use [probably insecure] XML entities, but fine with this.

elucify commented 7 years ago

I think this policy reasonably addresses the issues I pointed out before. As for insecure XML entities, yes, they're a vulnerability (billion laughs, etc.) The problem is, they're a legacy vulnerability with no clear, immediate fix. So we're going to have to bring our users along gradually to https.

elucify commented 7 years ago

@konklone asking for a "live" example, do you mean specific documents loadable over http and https, that reference external entities using http-based SYSTEM identifiers? Or do you just mean specific text examples like the ones I mention in #215?

konklone commented 7 years ago

@elucify Ideally, both! :) Or at least, the former should contain the latter anyway.

nacin commented 7 years ago

@elucify Entity expansion is an issue, but far more concerning (especially in a world where HTTP still exists) is https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing.

richsalz commented 7 years ago

Insecured XXE's are bad and plaintext fetching leaves open a security hole.

konklone commented 7 years ago

@elucify Given @nacin's and @richsalz' comments pointing to real security holes this leaves, do you have any suggestions on ways we could make this stronger or ask more of agencies? If at all possible, we should be exerting pressure on the ecosystem to improve.

richsalz commented 7 years ago

Perhaps, if you need to support non-TLS clients (and really, which ones?) you could start turning off HTTP, and leave just HTTPS, one or two days a month, and slowly move it up to full-time HTTPS.

elucify commented 7 years ago

I just discussed this with some of our pfrogrammers, and we have an idea for how our agency, at least, can help users over this particular hump.

Our agency hasn't signed up to add https to libxml2, though that's clearly something the libxml2 user community needs to accomplish if they want to keep libxml2 relevant. But as I point out in #215, all XML processors are required to support XML catalogs, albeit via platform-specific implementations.

We are discussing the possibility of delivering, probably via our ftp site, a bundle of all of our .dtd, .xsd, and .ent files, with an XML catalog mapping the public identifiers to the bundled files. We would also provide some documentation, or point to public docs, for how to provide configure various XML processors (libxml2, java, etc.) to use the catalogs to avoid fetching via https (and fetching local files instead). We have confirmed that this approach works for specific DTDs. All compliant XML processors should be amenable to this solution.

Of course, this solution causes problems of its own; for example, managing updates to any schema or entity files across an organization, and notifying clients of the new bundles. It also doesn't solve the problem that libxml2 can't use https to fetch the data documents themselves. But at least it would be a stopgap for users who rely on legacy data and code that they can't update immediately.

elucify commented 7 years ago

@richsalz it's not just non-TLS clients. Legacy XML documents often contain http public identifiers, and Java clients, steadfastly and by design, refuse to follow redirects from http to https. The only two solutions to that problem I can think of are an XML catalog or a custom entity resolver. (Please someone chime in if the redirection behavior in Java is now configurable--I've seen no reference to that anywhere.)

It seems to me that we should all be using https URLs for public identifiers exclusively, which solvers the Java problem. But that leaves all non-TLS clients, as you call them, in the dust, absent an XML catalog and someone who knows how to create or use it.

If the non-TLS clients were just some open-source abandonware out there, it would be maybe OK to just leave them behind. But libxml2 is a very popular XML processing toolchain in the UNIX/C/C++ world. Thus the problem.

richsalz commented 7 years ago

I understand the problem. I was just emphasizing that the current behavior is risky.

elucify commented 7 years ago

Our organization has been using blackout tests, and it has gone pretty smoothly. Those clients who care have updated promptly. So your advice is well-taken. We intend to have a migration plan for XML users 1st quarter 2017.

konklone commented 7 years ago

@elucify This is great info. So some questions:

elucify commented 7 years ago

@konklone, answers:

I'm guessing a lot of that 22.3% (65% of http traffic) without a user agent is libxml2, but I really have no way of knowing. If true, roughly a third of our traffic for those content types will fail if we start redirecting immediately.

konklone commented 7 years ago

@elucify As much work as you (especially) and I have put into this ticket, I'm starting to wonder if it's more appropriate to just leave this issue in the grey area of non-compliance. The only thing we've carved out a formal exception for is OCSP/CRL, which actually should not use HTTPS.

I've been doing analysis on M-15-13 compliance post-deadline -- there's been tremendous progress, but also a significant compliance gap yet to be closed. If the goal here is to continue trying hard to stage clients off of plain HTTP, I'm inclined to revisit this issue throughout 2017, as many agencies will be doing for their non-compliant services.

We could still add a section about this issue, so that it's clear that OMB/GSA/etc understand and empathize with the issue, with suggestions about what to do (and citing NIH's blackouts as a model example).

Thoughts?

konklone commented 7 years ago

Without further objection, closing this to deal with on an ad hoc basis throughout 2017 as described in https://github.com/GSA/https/pull/222#issuecomment-270037520.