eclipse-vertx / vert.x

Vert.x is a tool-kit for building reactive applications on the JVM
http://vertx.io
Other
14.31k stars 2.08k forks source link

Websockets serving cached/wrong responses and potentially have incorrect headers in websocket #4299

Open agapic opened 2 years ago

agapic commented 2 years ago

Version

Vertx core 3.6.3 AND 3.7.0 NGINX 1.19.9

Context

Vertx 3.6.3

We use NGINX 1.19.9 to proxy requests to our services in Kubernetes. One such service is a client manager that creates web sockets. For paths where the user in unauthorized, we call client.reject(403).

Let's say client 1 connects to the service using wss://myuri.com/api and gets a valid 403. Great. Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. The same is true for client 3, 4, 5, etc. Now all of our clients are in a bad state.

We've determined that it seems to be an interplay between nginx and vertx/netty. When we change upstream-keepalive-timeout to 0, the issue goes away. Only client 1 gets the 403. When we get rid of nginx entirely, the issue is also gone.

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#upstream-keepalive-timeout

Do you have a reproducer?

Vertx 3.7.0

Reproducer code: https://github.com/NguyenVoDev/pure-vertx-js/blob/main/src/main/java/org/example/Server.java

Note this won't work if you try and repro locally, only if using nginx with keepalive set to > 0 (we have 60s). It could actually just be the headers nginx is adding to the request though

Instead of clients receiving headers from the existing web socket connection (like in 3.6.3), now the connections just hang and the websocket handler isn't even called.

Extra

Are we supposed to be closing websockets after rejecting them?

agapic commented 2 years ago

@vietj any ideas what might cause this? This is causing a super critical bug for our product (911 software), and upgrading Vertx with our current configuration introduces other issues related to connectivity (but that’s an aside)

agapic commented 2 years ago

Added reproducer code for Vertx 3.7.0

vietj commented 2 years ago

can you upgrade to 3.9 ? that's the only version of Vert.x 3 we are currently supporting

agapic commented 2 years ago

@vietj I just tried with Vertx 4.2.3 - same issue as 3.7. We don't appear the get the issue with NodeJS websocket server "ws" library. One theory around that is the nodejs code explicitly sends the Connection: close header which might force nginx or the client to close the connection in its upstream cache.

agapic commented 2 years ago

I think fundamentally the root cause here is that Vertx doesn't expect different clients to re-use connections. The interplay between Nginx caching connections and Vertx causes issues. When a socket is rejected and a new client connects, the response from the previous client is left hanging because it never ended. Vertx tries to re-use the existing connection and incoming requests "hang" because the previous request is not finished (but really.. it is.. it's been rejected)

For 3.6.3, I think it's because Vertx reuses the websocket in the Http1xServerConnection class (line 302), and it reuses the headers too unfortunately, even if the incoming request has different headers.

However, in Vertx 3.9.12 AND 4.2.3l, instead of reusing the websocket, the connection hangs. :(

I think it hangs because in the Http1xServerConnection class in 3.9.12, it thinks there is a response in progress and pauses the request probably until the old conneciton is killed. My theory is that Vertx doesn't handle connection reuse correctly, which is what Nginx does with upstream caching. Our proposed solution is to close websockets after calling reject.

      synchronized (this) {
        requestInProgress = req;
        if (responseInProgress != null) {
          // Deferred until the current response completion
          req.pause();
          responseInProgress.enqueue(req);
          return;
        }
        responseInProgress = requestInProgress;
        req.handleBegin();
      }
vietj commented 2 years ago

I'll have a look at the reproducer, thanks

agapic commented 2 years ago

It's also very important that you redirect requests to the nginx endpoint.

Step 1: get the correct IP: ifconfig en0 inet and use that as your upstream IP (mine was 192.168.88.38) Step 2: copy this nginx.conf file:

map $http_upgrade $connection_upgrade {
    default Upgrade;
    '' close;
}

upstream logger {
    keepalive 10;

    server 192.168.88.38:8080;
}

server {
    listen 80 default_server;

    server_name _;

    charset UTF-8;

    location /ws {
        client_max_body_size 33m;
        proxy_set_header Host $http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_set_header Proxy "";

        proxy_http_version 1.1;
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 180s;
        proxy_buffering off;
        proxy_buffer_size 4k;
        proxy_buffers 4 4k;
        proxy_max_temp_file_size 1024m;
        proxy_request_buffering off;
        proxy_cookie_domain off;
        proxy_cookie_path off;

        proxy_pass http://logger;
    }

    location / {
        return 404;
    }

    access_log off;
    log_not_found off;
    error_log /var/log/nginx/error.log error;
}

Step 3: download nginx and run it - sample script:

-p 9090:80 \
-v $(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf \
nginx:1.21.6

Step 4: Send requests to :9090, which will forward requests to nginx. Then nginx will reuse the connections between requests/clients

vietj commented 2 years ago

can you provide a recap with all the precise intructions to reproduce it @agapic ?

agapic commented 2 years ago

Yeah I will put together a new repo with complete code that includes NGINX and a README for you tomorrow

vietj commented 2 years ago

thank you

On Fri, Mar 11, 2022 at 7:05 AM Andrew Gapic @.***> wrote:

Yeah I will put together a new repo with complete code that includes NGINX and a README for you tomorrow

— Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vert.x/issues/4299#issuecomment-1064802671, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCTRP77QHSDOX7M4KOLU7LPB3ANCNFSM5P7GEK6Q . You are receiving this because you were assigned.Message ID: @.***>

agapic commented 2 years ago

Here is the reproducer: https://github.com/agapic/vertx-nginx-websocket-repro

vietj commented 2 years ago

"Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. " : can you elaborate why nginx will resend previous request headers ?

vietj commented 2 years ago

when I try the reproducer I get:

julien@Juliens-MacBook-Pro vertx-nginx-websocket-repro % ./force-response.sh 200 ws://localhost:9090/ws
A848EC3A-9EF1-4A2A-8052-5EA8CEA9E386 looping 200 once-per-second
Connected (press CTRL+C to quit)

is that what is supposed to happen ?

I don't think so (the readme says it should hang)

agapic commented 2 years ago

If you send a 403 first and then try to send a 200 it should hang. Keep-alive is set to 10s so you have a 10s window to do it. If you set keep-alive in the nginx.conf to say 60s it should be more easily reproducable

agapic commented 2 years ago

"Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. " : can you elaborate why nginx will resend previous request headers ?

Nginx won't resend previous request headers, it sends the correct ones everytime. To rephrase - in 3.6.3 the websocket was re-using the old headers that were contained within the object rather than copying the new request headers over.

vietj commented 2 years ago

the script will try to send several requests as I can see, so what would be the accurate way to reproduce the issue ?

vietj commented 2 years ago

can you make a reproducer with a regular client like Vert.x instead of requiring two scripts to be executed ? it's not clear also how to observe the issue.

agapic commented 2 years ago

Hi @vietj, I updated the code to connect to the websocket using the Httpclient class, everything is run on a periodic timer now. After running it you will notice you will stop receiving responses - i.e. logger.info("Received a 200 response"); in Client.java. If you change the port to 8081 (instead of 9090 in Client.java) you will always receive responses.

agapic commented 2 years ago

@vietj were you able to get any more info from my update?

vietj commented 2 years ago

not yet the time to get back to this

vietj commented 2 years ago

@tobias I've run the reproducer for 3 minutes and I'm continusouly getting

14:50:41.993 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Attempting to connect to websocket and expecting a 403 response
14:50:41.993 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Attempting to connect to websocket and expecting a 200 response
14:50:41.996 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Received a 200 response
14:50:41.997 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Received a 403 successfully

is that expected ?

tobias commented 2 years ago

Hi @vietj! I have no idea, I haven't looked at vert.x in > 5 years :) Is this related to some work I did back then?

vietj commented 2 years ago

I meant that was I observe seems to work correctly, can you confirm it ?

tobias commented 2 years ago

Well, I haven't touched vert.x in years, so I don't know what value there would be in me getting involved again.

vietj commented 2 years ago

this was actually meant to @agapic sorry for the noise

tobias commented 2 years ago

Ah, that makes sense @vietj. No worries!