Open agapic opened 2 years ago
@vietj any ideas what might cause this? This is causing a super critical bug for our product (911 software), and upgrading Vertx with our current configuration introduces other issues related to connectivity (but that’s an aside)
Added reproducer code for Vertx 3.7.0
can you upgrade to 3.9 ? that's the only version of Vert.x 3 we are currently supporting
@vietj I just tried with Vertx 4.2.3 - same issue as 3.7. We don't appear the get the issue with NodeJS websocket server "ws" library. One theory around that is the nodejs code explicitly sends the Connection: close header which might force nginx or the client to close the connection in its upstream cache.
I think fundamentally the root cause here is that Vertx doesn't expect different clients to re-use connections. The interplay between Nginx caching connections and Vertx causes issues. When a socket is rejected and a new client connects, the response from the previous client is left hanging because it never ended. Vertx tries to re-use the existing connection and incoming requests "hang" because the previous request is not finished (but really.. it is.. it's been rejected)
For 3.6.3, I think it's because Vertx reuses the websocket in the Http1xServerConnection class (line 302), and it reuses the headers too unfortunately, even if the incoming request has different headers.
However, in Vertx 3.9.12 AND 4.2.3l, instead of reusing the websocket, the connection hangs. :(
I think it hangs because in the Http1xServerConnection class in 3.9.12, it thinks there is a response in progress and pauses the request probably until the old conneciton is killed. My theory is that Vertx doesn't handle connection reuse correctly, which is what Nginx does with upstream caching. Our proposed solution is to close websockets after calling reject.
synchronized (this) {
requestInProgress = req;
if (responseInProgress != null) {
// Deferred until the current response completion
req.pause();
responseInProgress.enqueue(req);
return;
}
responseInProgress = requestInProgress;
req.handleBegin();
}
I'll have a look at the reproducer, thanks
It's also very important that you redirect requests to the nginx endpoint.
Step 1: get the correct IP: ifconfig en0 inet
and use that as your upstream IP (mine was 192.168.88.38)
Step 2: copy this nginx.conf file:
map $http_upgrade $connection_upgrade {
default Upgrade;
'' close;
}
upstream logger {
keepalive 10;
server 192.168.88.38:8080;
}
server {
listen 80 default_server;
server_name _;
charset UTF-8;
location /ws {
client_max_body_size 33m;
proxy_set_header Host $http_host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Proxy "";
proxy_http_version 1.1;
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 180s;
proxy_buffering off;
proxy_buffer_size 4k;
proxy_buffers 4 4k;
proxy_max_temp_file_size 1024m;
proxy_request_buffering off;
proxy_cookie_domain off;
proxy_cookie_path off;
proxy_pass http://logger;
}
location / {
return 404;
}
access_log off;
log_not_found off;
error_log /var/log/nginx/error.log error;
}
Step 3: download nginx and run it - sample script:
-p 9090:80 \
-v $(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf \
nginx:1.21.6
Step 4: Send requests to
can you provide a recap with all the precise intructions to reproduce it @agapic ?
Yeah I will put together a new repo with complete code that includes NGINX and a README for you tomorrow
thank you
On Fri, Mar 11, 2022 at 7:05 AM Andrew Gapic @.***> wrote:
Yeah I will put together a new repo with complete code that includes NGINX and a README for you tomorrow
— Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vert.x/issues/4299#issuecomment-1064802671, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCTRP77QHSDOX7M4KOLU7LPB3ANCNFSM5P7GEK6Q . You are receiving this because you were assigned.Message ID: @.***>
Here is the reproducer: https://github.com/agapic/vertx-nginx-websocket-repro
"Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. " : can you elaborate why nginx will resend previous request headers ?
when I try the reproducer I get:
julien@Juliens-MacBook-Pro vertx-nginx-websocket-repro % ./force-response.sh 200 ws://localhost:9090/ws
A848EC3A-9EF1-4A2A-8052-5EA8CEA9E386 looping 200 once-per-second
Connected (press CTRL+C to quit)
is that what is supposed to happen ?
I don't think so (the readme says it should hang)
If you send a 403 first and then try to send a 200 it should hang. Keep-alive is set to 10s so you have a 10s window to do it. If you set keep-alive in the nginx.conf to say 60s it should be more easily reproducable
"Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. " : can you elaborate why nginx will resend previous request headers ?
Nginx won't resend previous request headers, it sends the correct ones everytime. To rephrase - in 3.6.3 the websocket was re-using the old headers that were contained within the object rather than copying the new request headers over.
the script will try to send several requests as I can see, so what would be the accurate way to reproduce the issue ?
can you make a reproducer with a regular client like Vert.x instead of requiring two scripts to be executed ? it's not clear also how to observe the issue.
Hi @vietj, I updated the code to connect to the websocket using the Httpclient class, everything is run on a periodic timer now. After running it you will notice you will stop receiving responses - i.e. logger.info("Received a 200 response");
in Client.java
. If you change the port to 8081 (instead of 9090 in Client.java) you will always receive responses.
@vietj were you able to get any more info from my update?
not yet the time to get back to this
@tobias I've run the reproducer for 3 minutes and I'm continusouly getting
14:50:41.993 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Attempting to connect to websocket and expecting a 403 response
14:50:41.993 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Attempting to connect to websocket and expecting a 200 response
14:50:41.996 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Received a 200 response
14:50:41.997 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Received a 403 successfully
is that expected ?
Hi @vietj! I have no idea, I haven't looked at vert.x in > 5 years :) Is this related to some work I did back then?
I meant that was I observe seems to work correctly, can you confirm it ?
Well, I haven't touched vert.x in years, so I don't know what value there would be in me getting involved again.
this was actually meant to @agapic sorry for the noise
Ah, that makes sense @vietj. No worries!
Version
Vertx core 3.6.3 AND 3.7.0 NGINX 1.19.9
Context
Vertx 3.6.3
We use NGINX 1.19.9 to proxy requests to our services in Kubernetes. One such service is a client manager that creates web sockets. For paths where the user in unauthorized, we call
client.reject(403)
.Let's say client 1 connects to the service using
wss://myuri.com/api
and gets a valid 403. Great. Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. The same is true for client 3, 4, 5, etc. Now all of our clients are in a bad state.We've determined that it seems to be an interplay between nginx and vertx/netty. When we change upstream-keepalive-timeout to 0, the issue goes away. Only client 1 gets the 403. When we get rid of nginx entirely, the issue is also gone.
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#upstream-keepalive-timeout
Do you have a reproducer?
Vertx 3.7.0
Reproducer code: https://github.com/NguyenVoDev/pure-vertx-js/blob/main/src/main/java/org/example/Server.java
Note this won't work if you try and repro locally, only if using nginx with keepalive set to > 0 (we have 60s). It could actually just be the headers nginx is adding to the request though
Instead of clients receiving headers from the existing web socket connection (like in 3.6.3), now the connections just hang and the websocket handler isn't even called.
Extra
Are we supposed to be closing websockets after rejecting them?