ChilliCream / graphql-platform

Welcome to the home of the Hot Chocolate GraphQL server for .NET, the Strawberry Shake GraphQL client for .NET and Banana Cake Pop the awesome Monaco based GraphQL IDE.
https://chillicream.com
MIT License
5.26k stars 746 forks source link

Application Stops Responding #4688

Closed ghost closed 2 years ago

ghost commented 2 years ago

Is there an existing issue for this?

Describe the bug

Hi, I have an issue where my application just stops or takes a long time to respond to the requests.

The application is working fine and all of a sudden we can see and downfall on the rps, the CPU also drops.

Response time: image

Throughput: image

CPU: image

Memory: image

This started happening when we migrated from version 11 to 12, also this is happening from time to time and we have yet to identify the root problem.

On version 12.3.2 we had the same behavior but we also had this error

at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at System.Threading.SemaphoreSlim.WaitUntilCountOrTimeoutAsync(TaskNode asyncWaiter, Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at HotChocolate.Execution.RequestExecutorProxy.GetRequestExecutorAsync(CancellationToken cancellationToken)

On version 12.5.0 we don't see any error but like I said before have the same behavior as on the previous version.

We were able to perform a thread profiler but it doesn't give much info. image

Performing some search for something similar we stumbled upon this on Github https://github.com/mgravell/Pipelines.Sockets.Unofficial/issues/28 that references this post https://blog.marcgravell.com/2019/02/fun-with-spiral-of-death.html

The application is a gateway with a local schema and with federated schema with polling on with only one domain service.

Can someone help?

.Net Version - 3.1 HotChocolateVersion - 12.5.0

michaelstaib commented 2 years ago

It seems this is related to a bug in .NET Core 3.1. Have you tried to update to .NET 6?

ghost commented 2 years ago

We are in the process of migrating to .Net 6 and that can take some time.

Still, I find it hard to believe that it's an issue with .Net Core 3.1 like I said before when with were using version 11 of HotChocolate this didn't happen, maybe we were just lucky don't know.

I think that the problem is somewhere with the use of the SemaphoreSlim that we use on our application and also if I'm not mistaken the HotChocolate engine also uses.

michaelstaib commented 2 years ago

Just read the blog you posted and there he stated that the issue was a bug in the SemaphoreSlim. I have not looked into this specific issue yet since I thought that kind was your conclusion :)

michaelstaib commented 2 years ago

This is the code your error stack refers to.

https://github.com/ChilliCream/hotchocolate/blob/main/src/HotChocolate/Core/src/Execution/RequestExecutorProxy.cs

ghost commented 2 years ago

I saw the file, and I honestly don't see anything unusual.

The problem may be within RequestExecutorResolve and in the use of ConcurrentDictionary with SemaphoreSlim that is causing the issue, I don't know and my out of ideas 😢

We do not have much information and we are trying to collect more information, for now, our solution is to have an alarmist, and if something happens to restart the container.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.