Closed jaraco closed 2 years ago
After some consideration, I determined that a workaround may be healthy for consumers. In #1976, I implemented what the user suggested, wrapping the host header to transform dangerous values by eliminating them. Currently, only newlines are considered dangerous. We can expand that list of characters if needed. Fix released in v18.8.0.
This issue was reported to security@cherrypy.dev as a security issue on 2022-06-18.
The report to security@cherrypy.dev included this content:
Based on the report to security@, I created this repro:
Launching the server with
pip-run -q -- repro.py
, and using httpie to query the server, I confirm that arbitrary bytes are received by the server handler:Here's the concern as reported by the discoverer:
investigation
Despite the references above, I've been unsuccessful in identifying where these characters even get decoded.
I do see where RFC 9110 does say:
The message, however, does not contain these characters. It contains characters like
=
and2
, which are interpreted after they've been received at the server. If CherryPy were to decode these charaters and then forward them in a Host header, that would violate the protocol.It does seem intentional that CherryPy is supporting RFC 2047 decoding. Here's is where the decoding happens and here is where that call is invoked.
It's not immediately obvious that there's a problem with accepting a host header with unusual characters, but it's imaginable how downstream consumers of the value might be expecting a valid hostname.
I can see that right off the bat, self.base ends up using that value unchecked. I do see that value is used in constructing URLs.
Still, it's not obvious to me that CherryPy should be responsible for validating this (or other) fields.
I did find this guidance that indicates:
I'm not sure it should be the responsibility of all CherryPy applications to validate the host header here, or if it should defer that responsibility to the application that might be utililizing that field. I'm not even sure what a valid host header would be. I can come up with some seemingly invalid values, but I'm uncertain what values should be allowed or disallowed and I haven't yet found any resources that give good guidance on what an HTTP server should do here.
For example, I read this resource, which suggests:
Obviously, such an action would not be the responsibility of the server but would be the responsibility of the downstream user.
The reporter has suggested:
Unfortunately, that proposal would require the Headers object to differentiate between sanitized and unsanitized headers, something the object model doesn't currently support.
Moreover, it strikes me as inappropriate to silently sanitize inputs from the request.