aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.02k stars 2k forks source link

Customize max_line_size and max_field_size in HttpParser? #2988

Open alexandru opened 6 years ago

alexandru commented 6 years ago

We're using aiohttp 2.x with Python 3.5.2 and trying to upgrade to the latest aiohttp 3.2.0 with Python 3.6.5.

We're getting errors such as these:

[2018-05-10 08:31:55 +0000] [12471] [ERROR] Error handling request
Traceback (most recent call last):
  File "/project/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 235, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
  File "aiohttp/_http_parser.pyx", line 297, in aiohttp._http_parser.HttpParser.feed_data
  File "aiohttp/_http_parser.pyx", line 425, in aiohttp._http_parser.cb_on_header_value
aiohttp.http_exceptions.LineTooLong: 400, message='Got more than 8190 bytes (9383) when reading Header value is too long.'

This was happening in aiohttp 2.x as well, because I believe the standard specifies that length as the maximum, however the real world doesn't always agree with standards 😒

To fix this, with aiohttp 2.x we were monkey-patching HttpParser in order to override max_line_size and max_field_size. Here's the horrible things we've been doing: gist — this file being imported in our gunicorn_worker.py.

Now that we've upgraded to 3.x, this monkey patching no longer works. Don't know why.

Any way to customize max_line_size and max_field_size? Possibly without monkey patching, but not necessarily, atm I just want the thing to work?

Environment:

See the full requirements.txt if interested.

Cheers,

asvetlov commented 6 years ago

Nobody implemented it yet. We have the very similar issue in backlog: #2304

pposca commented 6 years ago

Where would be the best place to expose max_line_size and max_field_size? Using an Application init param, like with client_max_size?

asvetlov commented 6 years ago

@pposca let's push params into BaseRequest first. After that, we can think how to configure it the best way. The library has several configuration parameters, would be nice to have a generalized solution for this. The objection to Application.__init__ is that the class is used for sub-applications too, but parameters like max_line_size make sense only for root application. Maybe we can live with it, or perhaps we will end up with config parameter which should be None for any sub-app. I don't know, let's discuss it later.

pposca commented 6 years ago

@asvetlov OK. There is also a max_headers param that seems related with max_line_size and max_field_size. Shouldn't it be added as well? It has sense for me and it would be free.

asvetlov commented 6 years ago

Agree, please support max_headers as well

EvgenyKhaliper commented 6 years ago

Guys, any progress with it ?

asvetlov commented 6 years ago

The issue is waiting for a champion

alexpantyukhin commented 6 years ago

I looked at it a little and found there is such chain Application._make_handler -> Server.call -> RequestHandler -> HttpRequestParser . _make_handler already takes some parameters with kwargs. Maybe it possible to merge kwargs with some config dictionary which are passed into __init__.

bmbouter commented 4 years ago

Our project is affected by the issue, and I'd like to open a PR to resolve it. Since 2018 I believe the need is no longer on Application._make_handler and instead to be configured on the ApplicationRunner created here instead. Do you agree?

Rather than adding options one-by-one, is there a way we can have many options specified by an application running with GunicornWebWorker? Here are some ideas:

Idea 1: A subclass pass extra kwargs to ApplicationRunner

To do this we could add an internal method named, e.g. _get_app_runner_instance(self) which return a configured instance of ApplicationRunner.

diff --git a/aiohttp/worker.py b/aiohttp/worker.py
index 73ba6e38..21fb671d 100644
--- a/aiohttp/worker.py
+++ b/aiohttp/worker.py
@@ -64,6 +64,17 @@ class GunicornWebWorker(base.Worker):

         sys.exit(self.exit_code)

+    def _get_application_runner_instance(self, app, **kwargs):
+        access_log = self.log.access_log if self.cfg.accesslog else None
+        runner = web.AppRunner(app,
+                               logger=self.log,
+                               keepalive_timeout=self.cfg.keepalive,
+                               access_log=access_log,
+                               access_log_format=self._get_valid_log_format(
+                                   self.cfg.access_log_format),
+                               **kwargs)
+        return runner
+
     async def _run(self) -> None:
         if isinstance(self.wsgi, Application):
             app = self.wsgi
@@ -73,13 +84,8 @@ class GunicornWebWorker(base.Worker):
             raise RuntimeError("wsgi app should be either Application or "
                                "async function returning Application, got {}"
                                .format(self.wsgi))
-        access_log = self.log.access_log if self.cfg.accesslog else None
-        runner = web.AppRunner(app,
-                               logger=self.log,
-                               keepalive_timeout=self.cfg.keepalive,
-                               access_log=access_log,
-                               access_log_format=self._get_valid_log_format(
-                                   self.cfg.access_log_format))
+
+        runner = self._get_application_runner_instance(app)
         await runner.setup()

         ctx = self._create_ssl_context(self.cfg) if self.cfg.is_ssl else None```

To allow GunicornWebWorker add additional "default" config options in the future without those config options also being added to every subclass there ever was, subclasses should use super() to inject the kwargs. Here would be an example of a subclass of CustomGunicornWebWorker that overrides max_line_size and max_field_size:

class CustomGunicornWebWorker(GunicornWebWorker):

    def _get_application_runner_instance(self, app):
        extra_kwargs = {"max_line_size" = 16000, "max_field_size": 16000}
        return super()._get_application_runner_instance(app, **extra_kwargs)

Idea 2: Have GunicornWebWorker accept a new application_runner_kwargs somehow

The application GunicornWebWorker patch to apply the patch (below) would be straightforward, but where does application_runner_kwargs come from?

diff --git a/aiohttp/worker.py b/aiohttp/worker.py
index 73ba6e38..b2c7194f 100644
--- a/aiohttp/worker.py
+++ b/aiohttp/worker.py
@@ -79,7 +79,8 @@ class GunicornWebWorker(base.Worker):
                                keepalive_timeout=self.cfg.keepalive,
                                access_log=access_log,
                                access_log_format=self._get_valid_log_format(
-                                   self.cfg.access_log_format))
+                                   self.cfg.access_log_format),
+                               **application_runner_kwargs)
         await runner.setup()

         ctx = self._create_ssl_context(self.cfg) if self.cfg.is_ssl else None
kylemacfarlane commented 3 years ago

I just ran into this when using aiohttp as a client to read from a server which is returning a gigantic CSP header.

The source of the problem seems to be here where a hardcoded non-customisable parser is used: https://github.com/aio-libs/aiohttp/blob/43c8168b0ee06392086b72e4e33ba4c5c1b60215/aiohttp/client_proto.py#L159-L168

I can't figure out how to get a override this method properly. It seems like you need a custom connector class providing a custom connection class providing a custom protocol which finally provides a custom parser, but I can't figure it out.

Monkey patching HttpResponseParser directly doesn't work, perhaps because of the C extensions.

In the end this monkey patch works but I hate it:

from typing import Optional

from aiohttp.client_exceptions import ClientPayloadError
from aiohttp.client_proto import ResponseHandler
from aiohttp.helpers import BaseTimerContext
from aiohttp.http import HttpResponseParser

def set_response_params(
    self,
    *,
    timer: Optional[BaseTimerContext] = None,
    skip_payload: bool = False,
    read_until_eof: bool = False,
    auto_decompress: bool = True,
    read_timeout: Optional[float] = None,
    read_bufsize: int = 2 ** 16
) -> None:
    self._skip_payload = skip_payload

    self._read_timeout = read_timeout
    self._reschedule_timeout()

    self._parser = HttpResponseParser(
        self,
        self._loop,
        read_bufsize,
        timer=timer,
        payload_exception=ClientPayloadError,
        response_with_body=not skip_payload,
        read_until_eof=read_until_eof,
        auto_decompress=auto_decompress,
        # These three lines are the important change
        max_line_size=8190 * 4,
        max_headers=32768 * 4,
        max_field_size=8190 * 4
    )

    if self._tail:
        data, self._tail = self._tail, b""
        self.data_received(data)

ResponseHandler.set_response_params = set_response_params