xstefanox commented 1 year ago

The current implementation defines a 10m server timeout for every route exposed by the Docker API.

Detected issue

There are some containers that need to listen for Docker events and trigger particular action when an event is received, for example swarm-cronjob.

The /events route is meant to stream forever but the timeout imposed by HAProxy is abruptly closing the connection when the timeout is reached.

Proposed solution

With this change, HAProxy will be configured with no timeout only for the /events route, while preserving the existing configuration for other routes.

Implementation details

A dedicated backend with timeout set to 0 is configured in HAProxy; this backend is used to handle only the /events route, which is matched using the same regular expression already used to choose whether the route is accessible or not depending on the related environment variable.

pedrobaeza commented 1 year ago

A good practice is to always set a timeout for everything, so changing this for all the uses is not correct. If you provide a non-default way to set for your case no timeout, it can be accepted, but not unconditionally.

xstefanox commented 1 year ago

Hi @pedrobaeza, no, we are not going to remove the timeout for every route, but only for the /events route.

The reason why I didn't provide an environment variable to override this timeout is because this API call is meant to never terminate by design. The /events route is a sort of custom implementation of Server Sent Events made by Docker: the client asks for the streaming of events emitted by Docker and listens to them forever, because an event stream is not meant to terminate unless the server is shutdown or restarted.

If you manually GET /events from your terminal connecting directly to the Docker daemon, you will find out that your client never terminates; instead, if you connect through the Docker Socket Proxy, the connection is terminated after 10 minutes in a non-clean way, which is an incorrect behaviour for an event stream.

Example output using HTTPie:

❯ http GET http://localhost:2375/events
HTTP/1.1 200 OK
api-version: 1.42
content-type: application/json
date: Wed, 20 Sep 2023 07:00:27 GMT
docker-experimental: false
ostype: linux
server: Docker/23.0.6 (linux)
transfer-encoding: chunked

http: error: ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

This behaviour causes the crash of some services that expect to consume the Docker events stream, for example the one that I mentioned in my previous post, which terminates with a fatal error and is then automatically restarted exactly every 10 minutes.

I agree with you that setting reasonable network timeouts is a good practice, but for this particular use case, applying a timeout of any value is a nonsense, that's the reason why I'm simply proposing to remove the timeout at all.

pedrobaeza commented 1 year ago

OK, thanks for the explanations. Seems reasonable. @yajo as you were the creator of this, do you see it correct?

Tecnativa / docker-socket-proxy

fix: no timeout for events route #98

Detected issue

Proposed solution

Implementation details