d-Rickyy-b / certstream-server-go

This project aims to be a drop-in replacement for the certstream server by Calidog. This tool aggregates, parses, and streams certificate data from multiple certificate transparency logs via websocket connections to the clients.
MIT License
94 stars 8 forks source link

Slow websocket clients get stuck / disconnected #29

Closed d-Rickyy-b closed 10 months ago

d-Rickyy-b commented 10 months ago

When a client is not able to keep up (see #28), the server should at least provide the client with certs at the max rate the client can handle. Currently there is a bug in certstream-server-go's websocket code that leads to clients being disconnected after some time, if they can't keep up.

Example

In a certain time frame the websocket processed 1636 certificates: image

In the same time frame >2000 certs were skipped (that's totally fine and actually is the solution to overloading the client) image

The actual websocket client (was heavily rate limited and) processed only 107 certs in the same time. image

Hypothesis

My current assumption is that the websocket code is not blocking, but instead buffering the certificates before actually sending them to the websocket. Depending on the server side buffer size and the client consumption rate, there might be a point where the websocket write (on the network) just happens too late (because it was written to a buffer first and only way later written to the network) and the deadline is exceeded.

After that, the broadcasthandler returns (line 34). But the websocket connection is not closed yet. Hence, this leaves no indication that something isn't working anymore.

https://github.com/d-Rickyy-b/certstream-server-go/blob/0dd05827a591724e2a1dbeac6329abc3b1bdecd9/internal/web/client.go#L28-L45

d-Rickyy-b commented 10 months ago

The clients being stuck issue was fixed in https://github.com/d-Rickyy-b/certstream-server-go/commit/8429ab57db471dc36a79b5adcd47101041281ea3.

The problem with disconnected clients should be fixed in https://github.com/d-Rickyy-b/certstream-server-go/commit/dc548b89639f0f89c9e31767cdc5f5f01db0cb4f.

Seems like the OS does weird things. A write deadline of 60 seconds prevents disconnected clients. I tested this by running a client with a limited capacity (100 certs/s) for a whole hour. With smaller values the client was disconnected after a few seconds already.