eclipse-vertx / vert.x

Vert.x is a tool-kit for building reactive applications on the JVM
http://vertx.io
Other
14.26k stars 2.07k forks source link

Request .send method invocation starts failing from 4.5.5 onward #5186

Closed ssill2 closed 5 months ago

ssill2 commented 5 months ago

Questions

Do not use this issue tracker to ask questions, instead use one of these channels. Questions will likely be closed without notice.

Version

4.5.4 works fine 4.5.5+ exhibits the issue.

Context

I have a small application that essentially functions as a middleware to do keycloak authentication and validation of role associations of webservice calls before making the calls to the backend that's not able to do the keycloak auth itself.

Requests to come to this app on 8081 and it makes requests to the backend application on port 8000.

This has been working great. In a periodic refresh of my dependency versions, I upgrade from 4.5.1 to 4.5.7. This broke my app.

I have a small section of code that is run after my app has determined that the caller is keycloak authenticated and has the correct role to execute http operation on the url.

The code looks like this. Right above this I've accepted a connection from the caller using a vert.x-web route and then validated that they've supplied a keycloak token that is valid.

// make a call to the backend  on behalf the caller and return the backend response data to the caller
                                client.request(HttpMethod.GET, "/api/v3/version_number/", ar1 ->
                                {
                                    if (ar1.succeeded())
                                    {
                                        HttpClientRequest backendRequest = ar1.result();
                                        backendRequest.send(ar2 ->
                                        {
                                            if (ar2.succeeded())
                                            {
// With 4.5.4 and earlier this the call works

                                                HttpClientResponse backendResponse = ar2.result();
                                                Future<Buffer> respBody = backendResponse.body();
                                                respBody.onSuccess((event) ->
                                                {
                                                    callerResponse.headers().addAll(backendResponse.headers());
                                                    callerResponse.setStatusCode(backendResponse.statusCode());
                                                    callerResponse.end(backendResponse.body().result());
                                                }).onFailure((event) ->
                                                {
                                                    callerResponse.putHeader(MiddlewareConstants.HTTP_HEADER__CONTENT_TYPE,
                                                            MiddlewareConstants.CONTENT_TYPE__TEXT_PLAIN);
                                                    callerResponse.setStatusCode(500);
                                                    callerResponse.end("Unable to read response body from backend");
                                                });
                                            }
                                            else
                                            {
// With 4.5.5+ this the call always falls through to here and ar2.result() returns null
                                                HttpClientResponse backendResponse = ar2.result();
// NPE is thrown right here because backendResponse is null
                                                callerResponse.headers().addAll(backendResponse.headers());
                                                callerResponse.setStatusCode(backendResponse.statusCode());
                                                callerResponse.end(backendResponse.body().result());
                                            }
                                        });
                                    }

I can confirm that vert.x. 4.5.4 works and it it basically breaks(ar2.result() starts returning null) with 4.5.5 and above

Do you have a reproducer?

I don't.

Extra

This is running in a docker container under minikube on a rocky9 linux host. I was looking at the change logs for 4.5.5 vs 4.5.4 and I didn't see anything that stuck out as being an obvious breaking change. I've been able to rig up tcpdump inside the container and capture .pcaps of requests from container to the backend container for both when it's working and when it's not.

Here's what the pcap looks like when it's working, i.e. the vert.x version is 4.5.4. Notice that connections to 8000 work fine image

And when the vert.x. version is 4.5.7. Now there are syn packets sent to 8000 but it never does an http conversation. image

The caller to my middleware eventually gets a 504 504 Gateway Time-out

tsegismont commented 5 months ago

Can you change the code to log the failure ar2.failure() ? If ar2 is not succeeded, it is expected that ar2.result() returns null.

ssill2 commented 5 months ago

I can try that. The interesting thing is that changing only the version of vert.x dep from 4.5.4 -> 4.5.5+ is all it takes to break things. I'll try that here in a bit and report back

tsegismont commented 5 months ago

I understand. But perhaps the failure stacktrace can provide us with useful information for troubleshooting.

ssill2 commented 5 months ago

I'm not seeing ar2.failure() as a valid method.

I updated the above code to the following but ar2 doesn't have a failure() method. Or did you mean some other way?

                                            else
                                            {
                                                LOG.warning(String.format("Request to backend failed: %s",
                                                        ar2.failure()));
// With 4.5.5+ this the call always false through to here and ar2.result() returns null
                                                HttpClientResponse backendResponse = ar2.result();
                                                callerResponse.headers().addAll(backendResponse.headers());
                                                callerResponse.setStatusCode(backendResponse.statusCode());
                                                callerResponse.end(backendResponse.body().result());
                                            }
tsegismont commented 5 months ago

Sorry, I meant io.vertx.core.AsyncResult#cause()

ssill2 commented 5 months ago

ah ok. Let me rebuild and see what I get.

ssill2 commented 5 months ago

ok, so here it is WARNING: Request to backend failed io.vertx.core.http.HttpClosedException: Connection was closed

The backend is a python django app written by others, so I'll have to see if there's a way to turn on more detail in that app. I've watched the logs at the same time on that application container and nothing comes out that says it's dropping anything, but perhaps I can have my python guys turn on some more debugging in django rest framework.

ssill2 commented 5 months ago

I appreciate your help in narrowing down exactly what the failure was. I'm going to work with my team and see if I can figure this out.

tsegismont commented 5 months ago

If there's any chance for you to create a small reproducer, it would be perfect

ssill2 commented 5 months ago

yeah not easily unfortunately since there's both the vert.x container and this other python django container. but I will let you know what I find

ssill2 commented 5 months ago

so looking into this, the python app is django running on uwsgi, which I believe is what is actually landing the http connections. I'm going to dig in and see if I can get some more info out of that. I'm going go ahead and close this ticket since I'm not sure how long that's going to take. Thanks again for pointing me in the right direction.

ssill2 commented 3 months ago

I'd like to reopen this if possible. The work around I had put in place to use the apache httpclient instead of the vert.x one worked, but it end up tying up the dispatcher thread.

My application basically acts as a proxy.

It checks the supplied keycloak token for validity and then checks that the user has the correct role to call the endpoint the are calling and then makes the request on their behalf to the backend and then returns that data to the caller.

Can you guys think of what changed from 4.5.4 to 4.5.5 that would cause connections to the backend, just http, no tls or anything like that, start getting connection closed to even the most simple http requests.

The backend is a python application being served up with UWSGI(https://uwsgi-docs.readthedocs.io/en/latest/index.html). You can see from the pcaps above that it doesn't even get as far as sending the request when on 4.5.5+ but works perfectly fine on 4.5.4