This issue was reported to security@cherrypy.dev as a security issue on 2022-06-18.

The report to security@cherrypy.dev included this content:

The exploit required spaces and slashes in the Host header (disallowed by NGINX). This exploit lead to RCE in the web service. We were able to exploit this using the format: Host: =?iso-8859-1?q?=45=58=50=4c=4f=49=54?= (where =45=58=50=4c=4f=49=54 was our exploit string, hex encoded ascii, CherryPy would interpret this as Host: EXPLOIT)

We believe this "Host" header should have been rejected by NGINX and/or CherryPy as this is an obviously invalid "Host" header/DNS entry.

We would recommend:

NGINX should block "Host" headers that start with =? (or block = and ? chars altogether in the "Host" header as it already does with other chars)

CherryPy should not accept invalid characters in Host headers, including those with ISO-8859-1 encoding (or not parse the "Host" headers with "iso-8859-1" encoding)

Based on the report to security@, I created this repro:

__requires__ = ['cherrypy']

import cherrypy

class HostPrinter:
    @cherrypy.expose
    def index(self):
        host = cherrypy.request.headers.get('Host')
        return 'Host Header:' + host

    @classmethod
    def start(cls):
        cherrypy.quickstart(cls())

__name__ == '__main__' and HostPrinter.start()

Launching the server with pip-run -q -- repro.py, and using httpie to query the server, I confirm that arbitrary bytes are received by the server handler:

$ http :8080 'Host: =?iso-8859-1?q?=21=40=27=22=3c=3e=0d=0a=74=65=73=74=20=6e=65=77=20=6c=69=6e=65?='
HTTP/1.1 200 OK
Content-Length: 33
Content-Type: text/html;charset=utf-8
Date: Sun, 17 Jul 2022 18:01:36 GMT
Server: CherryPy/18.7.0

Host Header:!@'"<>
test new line

Here's the concern as reported by the discoverer:

I believe the relevant code in question is here: https://github.com/cherrypy/cherrypy/blob/12a06bf717effe21973953f30771a56c74130621/cherrypy/lib/httputil.py#L432

I believe this would be the most relevant RFC: https://www.rfc-editor.org/rfc/rfc9110.html#name-field-values

This RFC also contains a relevant security discussion: https://www.rfc-editor.org/rfc/rfc9110.html#name-attacks-based-on-command-co

I see no valid use case for CherryPy to accept a "Host" header containing non-standard characters (non IP/hostname characters -- https://en.wikipedia.org/wiki/Hostname#Syntax). As a good defense in depth measure/a possible compromise here would be to block/strip all "dangerous" header values, and then require something like this to fetch any unsafe value: cherrypy.request.headers.get('Host', unsafe=true)

investigation

Despite the references above, I've been unsuccessful in identifying where these characters even get decoded.

I do see where RFC 9110 does say:

Field values containing CR, LF, or NUL characters are invalid and dangerous, due to the varying ways that implementations might parse and interpret those characters; a recipient of CR, LF, or NUL within a field value MUST either reject the message or replace each of those characters with SP before further processing or forwarding of that message.

The message, however, does not contain these characters. It contains characters like = and 2, which are interpreted after they've been received at the server. If CherryPy were to decode these charaters and then forward them in a Host header, that would violate the protocol.

It does seem intentional that CherryPy is supporting RFC 2047 decoding. Here's is where the decoding happens and here is where that call is invoked.

It's not immediately obvious that there's a problem with accepting a host header with unusual characters, but it's imaginable how downstream consumers of the value might be expecting a valid hostname.

I can see that right off the bat, self.base ends up using that value unchecked. I do see that value is used in constructing URLs.

Still, it's not obvious to me that CherryPy should be responsible for validating this (or other) fields.

I did find this guidance that indicates:

Since the host and port information acts as an application-level routing mechanism, it is a frequent target for malware seeking to poison a shared cache or redirect a request to an unintended server. An interception proxy is particularly vulnerable if it relies on the host and port information for redirecting requests to internal servers, or for use as a cache key in a shared cache, without first verifying that the intercepted connection is targeting a valid IP address for that host.

I'm not sure it should be the responsibility of all CherryPy applications to validate the host header here, or if it should defer that responsibility to the application that might be utililizing that field. I'm not even sure what a valid host header would be. I can come up with some seemingly invalid values, but I'm uncertain what values should be allowed or disallowed and I haven't yet found any resources that give good guidance on what an HTTP server should do here.

For example, I read this resource, which suggests:

One way to validate host headers, where needed, is to create a whitelist of permitted domains and check host headers in incoming requests against this list.

Obviously, such an action would not be the responsibility of the server but would be the responsibility of the downstream user.

The reporter has suggested:

As a good defense in depth measure/a possible compromise here would be to block/strip all "dangerous" header values, and then require [a separate call to retrieve the unsanitized value].

Unfortunately, that proposal would require the Headers object to differentiate between sanitized and unsanitized headers, something the object model doesn't currently support.

Moreover, it strikes me as inappropriate to silently sanitize inputs from the request.

cherrypy / cherrypy

Host header leaks dangerous characters under RFC 2047 encoding #1974

investigation