Open Hexta opened 1 year ago
I would think we're not observing the close socket frame and using it to recognize the graceful shutdown and just treating it as an abrupt connection termination, seems like something we should be able to do.
cc @alyssawilk
A short term solution might just be to increase your consecutive errors to more than 1 - that way you'd only trip the outlier detection if the connections repeatedly fails within a short period of time.
I looked at the events happening on the wire and a client sends ws opcode 8 (Connection close) to the server. The server sends TCP FIN which is interpreted as RemoteClose. It is local origin event and is mapped to HTTP code 503 which lands in outlier detector as gateway error. I will try to fix it.
It appears that Envoy does NOT parse websockets headers. The codebase contains websocket codec, which could be used to catch opcode 8 (FIN/Connection Close) coming from the server and "prepare" for TCP FIN coming from upstream, but that codec seems to be not used and not linked into the binary.
Hi, we recently encountered this issue where a graceful WS close (code 1001) was detected as failed endpoint after the TCP socket was closed. Seems that envoy considers that the "http response headers" were not received before closing the tCP connection, thus the outlier. Looking at the code I'm not sure how to fix this. Seems a bigger development than my C++ skills can handle. Let me know if someone plan to fix this and if I can help. @cpakulski did you manage to have somth working ? Thanks
@hypnoce It is still an issue. Fixing it requires adding websockets parser to detect that client requests server to close the connection and not interpret server's FIN as error.
Title: Consecutive Gateway Failure outlier detection ejects WebSocket endpoints on graceful connection close
Description: Consecutive Gateway Failure outlier detection ejects WebSocket endpoints on graceful connection close.
Repro steps:
echo foo | websocat ws://localhost/ws
Config:
lds.yaml
cds.yaml
Logs: app log
outlier detection event log