Open submarines-and opened 2 years ago
On closer inspection, it seems like the connection is closing on first message sent to server.
Reproduction (swap web socket-url to one using cloud endpoints vs not). Works when bypassing
var socketEndpoint = "wss://websocket-url.com"
let socket = new WebSocket(socketEndpoint);
socket.onopen = async (message) => console.log('CONNECTED', message);
socket.onmessage = (message) => message.data === "2" ? socket.send(${3}
) : console.log(message); //ping
socket.onclose = (message) => console.log('CLOSE', message)
socket.onerror = (ex) => console.log('ERROR', ex);
WAIT AND THEN SEND - CLOSES INSTANTLY
socket.send(40${JSON.stringify({ message: "test", })}
);
How did you deploy ESPv2? I assume it is in Cloud Run, Is following your setup?
websocket client -> ESPv2 (deployed in Cloud Run) -> Webscoket server (deployed in Cloud Run)
We have an integration test to test websocket. Of course it is all in localhost as
websocket-client -> ESPv2 -> websocket echo server
websocket server just echos whatever message it receives. client send 5 messages, and verified it received 5 echoed messages.
I just checked our code we dont set max_stream_duration
So I don't know, could you turn on debug by adding --enable_debug
flag in ESPv2 and get its log for us. It may provide more clues.
I did some more testing (using a different socket library) and can see that the connection in fact closes as soon as the SERVER sends a message. Maybe the gateway is considering the connection finished when it sends data?
1) That setup is correct. We are using the deploy script found in the official GCP tutorial
2) I added the debug flag, but I am not really sure what to look for. Is there I way I can share the logs privately (because they contain quite a bit of internal information)
Since you deployed ESPv2 in Cloud Run, you can get the logs from Cloud Console, => Cloud Run => logs.
you can email it directly to me at qiwzhang@google.com
Logs emailed (and uploaded to GCP support case)
I also tried using server side events as a substitute, but they also do not work (close after 15 seconds), while working fine if espv2 is bypassed.
In envoy docs is says route timeout needs to be disabled: https://www.envoyproxy.io/docs/envoy/latest/faq/configuration/timeouts#route-timeouts
But the only options I have in espv2 isdeadline
in openapi-spec and http_request_timeout_s
in deploy options. I have set both to 300 and it still closes after 15s.
Is this unsupported, or is there another flag to set? I know this is unrelated to this issue, but if SSE works, I can use that instead and this issue does not need to be investigated.
15s
seems suspiciously similar to our OpenAPI default deadline
: https://cloud.google.com/endpoints/docs/openapi/openapi-extensions#deadline
I have set both to 300 and it still closes after 15s.
After updating the OpenAPI spec and using gcloud services deploy
, did you make sure to rebuild and redeploy the ESPv2 container onto Cloud Run?
Here is the logic where we update Envoy route's timeout
and idle_timeout
based on the deadline configuration
We have a specific case for method.IsStreaming
to disable timeout
completely.
My suspicion is how we determine if method.IsStreaming
is set, maybe we are setting it to false
for your method. Is there any way to specify in OpenAPI spec if your method is using server side events? If not, then ESPv2 will not know which paths in your OpenAPI spec are unary vs server side streaming. It will not disable the regular timeout
field.
For now, let's try to debug why regular websockets are not working based on the logs you sent @qiwzhang .
Here is the timeline from the ESP logs I attached here https://b.corp.google.com/issues/235315519#comment5.
05:21:28.019: request come with Upgrade
05:21:28.033: server replied with websocket accept
05:21:28.092 upstream connection reset
upstream connection reset is less than 1 second after accepted.
So I don't feel it is our default deadline issue.
On Mon, Jun 13, 2022 at 7:12 AM Teju Nareddy @.***> wrote:
Here is the logic where we update Envoy route's timeout and idle_timeout based on the deadline configuration
We have a specific case for method.IsStreaming to disable timeout completely.
My suspicion is how we determine if method.IsStreaming is set, maybe we are setting it to false for your method. Is there any way to specify in OpenAPI spec if your method is using server side events? If not, then ESPv2 will not know which paths in your OpenAPI spec are unary vs server side streaming. It will not disable the regular timeout field.
For now, let's try to debug why regular websockets are not working based on the logs you sent @qiwzhang https://github.com/qiwzhang .
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/esp-v2/issues/704#issuecomment-1153972274, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ6TBZ2PSVHKEV5KR67Q7LVO46WFANCNFSM5YBTOUBQ . You are receiving this because you were mentioned.Message ID: @.***>
Running into the same issue. Was this ever resolved @submarines-and @qiwzhang
We switched architecture
1) Switched web sockets for "Server side events". 2) Replaced cloud endpoints with a custom gateway running in cloud run
Also logged in GCP support portal as case id: 29925315
We are having a strange issue with web sockets and esp-v2. Basically, new websocket connections closes after sending first message to server with status
wasClean
false.Response from websocket connection looks fine:
Our setup: Browser --> Cloud endpoints (with espv2) --> Cloud run --> Socket.io through NestJs on NodeJs
If we remove Cloud endpoints and go directly to cloud run, everything works fine. Browser --> Cloud run --> Nodejs
My top suspicions: 1) There is a timeout being hit for streams.E.g.
max_stream_duration
or similar. I am using defaults for all timeout values; maybe there is another value I can increase? https://cloud.google.com/endpoints/docs/openapi/specify-esp-v2-startup-options#timeoutThis issue seems similar if this is the case: https://github.com/projectcontour/contour/issues/3692
2) Maybe we have misunderstood what is required in the openapi specification. Currently, we have a /get endpoint for the web sockets. If we remove this, no requests come through, but maybe there is a magic way to configure this.
Is this a bug in esp-v2 or is there maybe some flag we are missing to get it to work?