Closed afrancoc2000 closed 1 week ago
There is a connection timeout configuration for the upstream cluster that might need to be adjusted. There are also default circuit breaker settings that often need to be adjusted. You can see if you're triggering the circuit breaker by looking at metrics.
Since you're using web sockets, connections cannot be reused between the client and Envoy and between Envoy and the upstream. When opening and closing connections quickly between the same pair of hosts (say, envoy and upstream), it's possible to exhaust the available ephemeral ports used for outbound connections (the combination of protocol/source-ip/source-port/destination-ip/destination-port must be unique and when they're all constant except the OS-chosen source-port, it's easy to run out). You can usually tell this is happening if netstat or similar tools show a large number of connections in close-wait or fin-wait stats when the errors start occurring.
I will check my connection timeout and the circuit breaker settings, Thank you so much
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
Envoy running short lived WebSocket load starts returning codes 503
Description: I have a WebSocket application using Envoy as the routing load balancer tool, and I'm doing some load tests to it to see how it behaves under load and what to expect on load scenarios from it. On high load with a persistent connection, I mean streaming messages through the same opened connection for 3 minutes I'm able to reach the 100% CPU in Envoy and measure the limits of requests I can handle but when I change to open the connection send a message and then after receiving an answer closing the connection, I start receiving errors 503 really fast instead of the expected 101 while trying to open new connections.
Repro steps: Using the following k6 script:
Running it like this:
Starts returning 503 errors:
Looking at a tcp Dump in Envoy we can see this happening while there are some retransmission errors on closing the websocket connection:
Dump against the client
Dump against the backend
This is how a successful websocket call looks like against the back:
Config:
The config is just using the router filter and enables the websocket upgrade
Logs:
Is there something that I'm missing in my configuration? What can I do to prevent this from happening? Is this the normal expected behavior?
Thanks