fastify / fastify-http-proxy

Proxy your http requests to another server, with hooks.
MIT License
337 stars 90 forks source link

Header encoding error for custom response headers: Invalid character in header content ["custom-response-header"] #343

Open jcbain opened 6 months ago

jcbain commented 6 months ago

Prerequisites

Fastify version

4.26.2

Plugin version

9.4.0

Node.js version

18

Operating system

Linux

Operating system version (i.e. 20.04, 11.3, 10)

debian 10

Description

Our proxying mechanism is experiencing an uncaught exception for custom response header values that contain characters such as é with the following error message Invalid character in header content ["custom-response-header"]. When proxying traffic, we can't control all of the possible header values upstream services might decorate a reply with and I was curious on your all's thoughts on handling situations with these special characters.

Steps to Reproduce

Create an upstream that sets an invalid response header:

const httpProxy = require('@fastify/http-proxy');
const fastify = require('fastify');

const app = fastify();

app.get('/', (request, reply) => {
  reply.header('custom-response-header', 'éeek');
  return { message: 'hello world' }
});

app.listen({ port:4001 });

const proxy = fastify();

proxy.register(httpProxy, { 
   upstream: 'http://localhost:4001'
});

proxy.listen({ port: 6001 });

Expected Behavior

When looking at the RFC8187 section 3.2.1 (https://www.rfc-editor.org/rfc/rfc8187#section-3.2.1), it specifies the following, which seems as if it should be handled in some way instead of throwing.

      Note: Recipients should be prepared to handle encoding errors,
      such as malformed or incomplete percent escape sequences, or
      non-decodable octet sequences, in a robust manner.  This
      specification does not mandate any specific behavior; for
      instance, the following strategies are all acceptable:

      *  ignoring the parameter,

      *  stripping a non-decodable octet sequence, and

      *  substituting a non-decodable octet sequence by a replacement
         character, such as the Unicode character U+FFFD (Replacement
         Character).
mcollina commented 6 months ago

I agree! This should be handled somehow. Would you like to send a PR?

q0rban commented 6 months ago

After reading RFC 8187 a little closer, it seems that it only applies to headers that use parameters:

This document specifies an encoding suitable for use in HTTP header fields that is compatible with a simplified profile of the encoding defined in RFC 2231. It can be applied to any HTTP header field that uses the common "parameter" ("name=value") syntax.

So I think that means you can only specify an encoding on an http header that uses parameter syntax. For example, the content-disposition header:

https://nodejs.org/docs/latest/api/http.html#requestsetheadername-value

const filename = 'Rock 🎵.txt';
request.setHeader('Content-Disposition', `attachment; filename*=utf-8''${encodeURIComponent(filename)}`);

So, let's say a content-disposition header is received by the proxy, the filename contains non-ascii characters, and is not encoded using the syntax described in RFC 8187. In that scenario, should the proxy return a 5xx error, should it strip the header, or should it attempt to properly encode the header following RFC 8187 (this seems complicated and would likely incur a performance penalty)?

Alternately, let's say a header is not in parameter syntax, and has non-ASCII characters in it. In that case, we cannot use the encoding approach outlined in RFC 8187. Should we url encode the string, regex-replace non-ASCII characters with ?, or some other approach?

Perhaps a good first step is to come up with a variety of example / test cases.

jcbain commented 6 months ago

Good catch. So I think then the relevant section is 3.2.4 of RFC 7230 https://www.rfc-editor.org/rfc/rfc7230#section-3.2.4.

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

Would this indicate that a proxy should be using the replacement character for those that fall outside of this standard per this line "A recipient SHOULD treat other octets in field content (obs-text) as opaque data"?