Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.32k stars 1.97k forks source link

[BUG] Cosmos library takes 100% CPU on Windows #6112

Closed dvanackere-lpg closed 4 years ago

dvanackere-lpg commented 4 years ago

Describe the bug Due to a bug in netty (https://github.com/netty/netty/issues/9710, already solved but no version of netty with this correction is available yet), the RntbdRequestTimer makes an infinite loop (Sleep(0)) when running on a Windows machine.

To Reproduce Run a Spring application with the cosmos library on Windows.

Code Snippet In RntbdRequestTimer: this.timer = new HashedWheelTimer(FIVE_MILLISECONDS, TimeUnit.NANOSECONDS)

Expected behavior This could be solved by:

Setup (please complete the following information):

Additional context I use this library through spring-data-cosmodb hosted in an Azure Function.

kushagraThapar commented 4 years ago

@David-Noble-at-work Can you please look into this?

dvanackere-lpg commented 4 years ago

I've recompiled the project to try different values for the timer: After setting the timer to 10 milliseconds, the CPU usage drop to 40% (in Azure). After setting the timer to 100 milliseconds, the CPU usage drop to 3% (in Azure), which is normal. I don't know if the problem is that the HashedWheelTimer has a poor performance (in a Windows machine at least). I don't know which side effects this timer have: my application seems to work normally but is it safe to keep this setting?

kushagraThapar commented 4 years ago

I've recompiled the project to try different values for the timer: After setting the timer to 10 milliseconds, the CPU usage drop to 40% (in Azure). After setting the timer to 100 milliseconds, the CPU usage drop to 3% (in Azure), which is normal. I don't know if the problem is that the HashedWheelTimer has a poor performance (in a Windows machine at least). I don't know which side effects this timer have: my application seems to work normally but is it safe to keep this setting?

Hey, we are testing this particular timer change on our end, and still waiting to get an update from a windows machine perspective.

We have tested this on Mac and linux, and it works fine for us, so you should be safe / good to keep this setting.

kushagraThapar commented 4 years ago

@David-Noble-at-work - any updates on this ?

kushagraThapar commented 4 years ago

@david-lpg I had a chat with David, and we couldn't see any CPU difference at least on linux, not sure about windows. So you are good to go with 100 milliseconds on the timer.

In addition to above, I would like to mention that we have exposed these options to our Rntbd (TCP Transport Client) in v3.6.0 release. So you don't need to build the SDK locally, or change the source code, you can use these options to provide these values to the SDK.

Here is the changelog and way to provide these options: https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/cosmos/changelog/README.md#360

Here is more information on how to use it: https://github.com/David-Noble-at-work/azure-cosmos-examples#using-system-properties-to-modify-default-direct-tcp-options

kushagraThapar commented 4 years ago

@david-lpg please close this if this fixes your issue.

dvanackere-lpg commented 4 years ago

Sorry for the delay, I had to work on an other project lately. I tried today to use the config file to modify the requestTimerResolution parameter. This seems to work even if I don't really know which side effects this parameter could have on my application. I'll use it for now. But the real fix would be to use netty 4.1.44.Final (4.1.42.Final at the moment). Thanks for your help.

kushagraThapar commented 4 years ago

Sorry for the delay, I had to work on an other project lately. I tried today to use the config file to modify the requestTimerResolution parameter. This seems to work even if I don't really know which side effects this parameter could have on my application. I'll use it for now. But the real fix would be to use netty 4.1.44.Final (4.1.42.Final at the moment). Thanks for your help.

Thanks @david-lpg We will update netty versions in next release.