ExpediaGroup / graphql-kotlin

Libraries for running GraphQL in Kotlin
https://opensource.expediagroup.com/graphql-kotlin/
Apache License 2.0
1.73k stars 343 forks source link

7.1.0 introduces Spring Boot Micrometer metrics errors #1969

Open roookeee opened 4 months ago

roookeee commented 4 months ago

Library Version 7.1.0

Describe the bug This PR moved the request execution away from netty to the default dispatcher. This causes issues in Micrometer (which we use for metrics) as its not thread-safe and multiple threads provide metrics (the netty thread provides http metrics, DefaultDispatcher based coroutines provide database metrics), see this issue.

We ran into this issue in the past as we did the same thing as the aforementioned graphql-kotlin PR. We had to move away from such a change and saw no performance degradation but graphql-kotlin >= 7.1.0 gives us no way to circumvent this issue now.

To Reproduce

Expected behavior Either revert the aforementioned pull request or give us the ability to disable this new feature of moving the request execution to the DefaultDispatcher.

samuelAndalon commented 4 months ago

The spring webflux reference highlights that reactor-http-epoll threads should only be used for handling requests including deserialization, it is very important to not include blocking operations or intensive CPU operations in those thread, thus, moving to a different thread is ideal.

Still, if you need to keep execution there because of an implicit contract with metrics, then you could extend the GraphQLServer class, contributions are always welcomed, so if you want to add an option to make this configurable we will be happy to review it, just keep executing the operation in the default dispatchers coroutine scope.

roookeee commented 3 months ago

We found another way to add our metric tags without creating the outlined issue. The update to 7.1.x still degrades our services performance by ~10x so we will provide a PR to make this configurable:

7.0.x image

7.1.x image

We are fairly certain that this difference is caused by swapping to another thread pool that was introduced in 7.1.x but will get back to you once we know for sure.

Thanks again!

reneleonhardt commented 1 month ago

@roookeee Did you find out if the new thread pool is the reason or can you propose another solution to fix this huge performance problem? An API without metrics is like flying blind in production... I hope OpenTelemetry (Jaeger Tracing) isn't affected as well 😱

roookeee commented 1 month ago

@reneleonhardt I sadly didn't have more time to investigate this issue, but 7.1.x introduced no other significant change that could cause such a performance degradation. We also dabbled with custom thread pool usage in the past and saw the same issue, so ...

Sadly I cannot outline another solution besides overriding the new defaults or making the usage of the new thread pool configurable.

I also won't be able to supply a PR for this anymore as we will sadly move away from graphql-kotlin in the near feature (tech stack consolidation effort, especially now that spring-boot-graphql is out and Netflix DGS will be deprecated soon).