dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.07k stars 2.03k forks source link

Orleans Client 1.5.2 issue with .NET PerformanceCounter constructor #4637

Closed jeoffman closed 3 years ago

jeoffman commented 6 years ago

We (@amccool) have a Windows Service that is an Orleans Client. The service startup initializes the Orleans Client, and on some VMs this is causing a timeout (from the Service Control Manager) and silent failure - no errors in logs (Orleans/text), no Windows Event other than the usual System one: "A timeout was reached (30000 milliseconds) while waiting for the XXX service to connect".

The program never has this issue when you run it on a command line, it only fails like this as a service. The service is running as "SYSTEM", but changing the "Log On" has no effect . The program does not have this problem on every VM, nor on my development machine.

I tracked the error down to the class RuntimeStatisticsGroup and its InitCpuMemoryCounters method. This line: cpuCounterPF = new PerformanceCounter("Processor", "% Processor Time", "_Total", true); Orleans 1.5.2 sources HERE. sometimes does not return within 30 seconds; pretty much ever-other-time I start the service, about 50% of the time.

I found this related issue on MSDN.

Sure enough, if I hack the Orleans source to use the empty/default constructor then the service does not timeout on that line, although I assume we would still eventually run into some kind of timeout once "NextValue" gets called - I haven't mangled my copy of the Orleans Client to test that yet. My hope would be that I can get the service started and whatever failure happens after that would at least get logged.

Is there some way to configure the ClientConfiguration to disable the Performance Counter setup? Windows Performance Counters are notoriously fragile and hard to work with...

benjaminpetit commented 6 years ago

Sorry for the late response.

I don't think you can deactivate this in 1.5. In 2.0, you can.

Are the performance counter created on the machine when timeout occurs? You don't see the line

Timeout occurred during initialization of CPU & Memory perf counters"

in your logs?

Normally, countersAvailable will be set to false if the init method failed, and Orleans will not try to run/read from performance counters.

jeoffman commented 6 years ago

There are no errors - the PerformanceCounters all seem to work AFAICT - it just takes several minutes to get them going.

benjaminpetit commented 6 years ago

I don't know if we have plan to add a flag in 1.5.x to disable .NET PerformanceCounter.

In 2.0 if you don't register the performance counter thing you should be fine. Do you plan to upgrade to 2.0 soon?

jeoffman commented 6 years ago

@benjaminpetit - Thank you for the reply. I don't think we will be able to move to dotnet core/Orleans 2.0 any time soon. Upgrading the silo+grains wouldn't be so hard, but we have a lot of external clients that are not dotnet core-ready.

I have already hacked together a flag in the ClientConfig to turn off PerformanceCounters. Something like: <PerformanceCounters UsePerformanceCounters="false" /> Are you open to a pull request? We are on 1.5.2.

forked and updated here: https://github.com/jeoffman/orleans/tree/UserPerformanceCounters-v1.5.4

benjaminpetit commented 6 years ago

I don't think we will be able to move to dotnet core/Orleans 2.0 any time soon. Upgrading the silo+grains wouldn't be so hard, but we have a lot of external clients that are not dotnet core-ready.

You don't have to use dotnet core, we support also the full framework (4.6.1 - if I remember correctly it's the same version in 1.5.x)

Are you open to a pull request?

Always! But before releasing a new versions, we have to launch an exhaustive set of test, and as a team decide to publish or not new packages. It takes time, and with the vacation period.... But contribution are always welcome, and we can see we can release 1.5.5 later on

benjaminpetit commented 3 years ago

Closing this issue, since I don't think we will do 1.5.x release at this point.