emissary-ingress / emissary

open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
https://www.getambassador.io
Apache License 2.0
4.32k stars 684 forks source link

Socket.IO with transport polling fails #5161

Open marcotrinelli opened 1 year ago

marcotrinelli commented 1 year ago

Describe the bug I get a 504 Gateway Timeout error, followed by a 400 Bad Requests when connecting to a socket.io server with transports: ['polling'] (Socket.IO transports option). The Socket.IO handshake succeeds (meaning that I get the logs on the connection event in the upstream service), but the data channel is cut just afterwards.

From the browser developer tools:

GET https://<origin>/<path>/socket.io/?EIO=4&transport=polling&t=Ob01get&sid=We8gI0PFNPB6TZuqADQB 504 (Gateway Timeout)

POST https://<origin>/<path>/socket.io/?EIO=4&transport=polling&t=Ob01j91&sid=We8gI0PFNPB6TZuqADQB 400 (Bad Request) {code: 1, message: "Session ID unknown"}

This issue only happens when using emissary-ingress as gateway: a local setup that does not involve emissary-ingress does not have such issue. Also, using transports: ['websocket'] does not produce this issue.

To Reproduce Using the following configs:

Expected behavior

Socket.IO connection successful (handshake and data channel)

Versions (please complete the following information):

marcotrinelli commented 1 year ago

Also, the issue persists when bypassing ext_auth (bypass_auth: true) and when using the default /socket.io path, and so using prefix: /socket.io and removing rewrite

cindymullins-dw commented 1 year ago

Hi @marcotrinelli , do you have Listeners configured on ports 80 and 443? Setting log level do 'debug' do you see any connection errors in Emissary pod logs? Can you check that your ambassador_id is tagged on all your CRDs including Listeners and Service?

marcotrinelli commented 1 year ago

Hi @cindymullins-dw , thanks for your support. Yes I have the default http & https listeners (port 8080 and 8443). I just checked and all CRDs are tagged with my ambassador_id. Yes I did debug this and the only log that I found useful was this:

[2023-07-06 10:37:45.610][42][debug][router] [source/common/router/router.cc:789] [C51637][S12436925586219655448] upstream timeout
[2023-07-06 10:37:45.611][42][debug][router] [source/common/router/upstream_request.cc:296] [C51637][S12436925586219655448] resetting pool request
[2023-07-06 10:37:45.611][42][debug][client] [source/common/http/codec_client.cc:125] [C51631] request reset
[2023-07-06 10:37:45.611][42][debug][connection] [source/common/network/connection_impl.cc:132] [C51631] closing data_to_write=0 type=1
[2023-07-06 10:37:45.611][42][debug][connection] [source/common/network/connection_impl.cc:242] [C51631] closing socket: 1
[2023-07-06 10:37:45.611][42][debug][client] [source/common/http/codec_client.cc:99] [C51631] disconnect. resetting 0 pending requests
[2023-07-06 10:37:45.611][42][debug][pool] [source/common/conn_pool/conn_pool_base.cc:343] [C51631] client disconnected, failure reason:
[2023-07-06 10:37:45.611][42][debug][http] [source/common/http/filter_manager.cc:873] [C51637][S12436925586219655448] Sending local reply with details upstream_response_timeout
[2023-07-06 10:37:45.611][42][debug][http] [source/common/http/conn_manager_impl.cc:1501] [C51637][S12436925586219655448] encoding headers via codec (end_stream=false):
':status', '504'
'content-length', '24'
'content-type', 'text/plain'
'date', 'Thu, 06 Jul 2023 10:37:45 GMT'
'server', 'envoy'

It looks like emissary-ingress (http client component) fails, but it does not give a reason why

cindymullins-dw commented 11 months ago

Thanks for checking. We don't see anything in the error logs that give much info unfortunately. Emissary does support Websockets so I'm glad to hear that it is working for your Socket.io websocket connections. For long-polling http connections afaik there's no specific Emissary spec for that. (By contrast, for websockets you have to 'allow_upgrade'.).

So I suspect either 1) polling for Socket.io is not supported or 2) perhaps there's an adjustment you can make to Socket.io to tweak the port assignment as noted here. If you want to ask about this Emissary help session (Thursdays @ 2:30pm ET) an engineer might know more. Pls let me know.

cindymullins-dw commented 10 months ago

I wonder if the notes here at bottom on polling are relevant in terms of how Envoy handles this.

cindymullins-dw commented 10 months ago

Perhaps you also need credentials: true here under CORS. That spec is available for CORS per our docs and I see it noted here on Socket.IO as a potential source of 400 errors.