Netflix / dgs-framework

GraphQL for Java with Spring Boot made easy.
https://netflix.github.io/dgs
Apache License 2.0
3.03k stars 286 forks source link

bug: Deadlock after upgrading to DGS 7.6.0 version #1887

Closed kkotamar closed 2 months ago

kkotamar commented 2 months ago

Please read our contributor guide before Upgraded to spring boot 3.2.1 and Netflix DGS version 7.6.0 After upgrading tomcat threads started going to wait status and never recover. Below is the thread state.

`"http-nio-8080-exec-5" #2276 [2362] daemon prio=5 os_prio=0 cpu=750.61ms elapsed=590.95s tid=0x00007f76372ced50 nid=2362 waiting on condition [0x00007f6fb3a78000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method)

Expected behavior

Threads should not get locked.

Actual behavior

Tomcat threads are waiting forever.

Steps to reproduce

Note: A test case would be highly appreciated, but we understand that's not always possible

srinivasankavitha commented 2 months ago

We haven't seen this issue so far. Also, could you please use the latest release which is 8.x so you are not running into issues that may have been fixed.

Finally, it would be helpful if you have a clear repro after making the above changes so it is easier to debug.

kkotamar commented 2 months ago

Hi, I tried with below versions and see the same issue.

  1. 8.3.1
  2. 8.5.3

I am trying to understand whether the completablefuture it is locked on is created by dgs framework or application. See the stack above. I dont have virtual threads enabled. It is based on java 21.

parking) at jdk.internal.misc.Unsafe.park(java.base@21.0.1/Native Method) - parking to wait for <0x0000000729691bc0> (a java.util.concurrent.CompletableFuture$Signaller) at

kkotamar commented 2 months ago

I have enabled the notprivacysafe logs. Below is the last log I see Executing 'def45088-4eb1-48e5-bc43-2d3ae34101d9' query operation: 'QUERY' using 'graphql.execution.AsyncExecutionStrategy' execution strategy

srinivasankavitha commented 2 months ago

The framework does not create any threads, so likely it is coming form the application.

kkotamar commented 2 months ago

This issue looks more like this. https://github.com/graphql-java/graphql-java/issues/2068

srinivasankavitha commented 2 months ago

It is hard to determine the issue based on the information so far, since we don't have a clear way to reproduce it. It would be helpful if you can provide a sample rep that reproduces this issue to investigate further. fwiw, we have hundreds of apps that are on Spring Boot 3.x and 7.x/8.x DGS framework and have not seen this problem. My best guess is that this is getting stuck doing some async work in the data fetcher. Again, it is hard to pinpoint without a reproduction of the issue.

Furthermore, the issue you linked to describes behavior in graphql-java, and the framework simply calls into graphql-java and does not schedule any work in other threads.

kkotamar commented 2 months ago

Thanks for looking at it. We finally found the reason, it was one of the library mismatch causing issue. What made it difficult to debug is there is not stack trace on what is causing issue except threads are waiting. This issue can be closed.