ZiggyCreatures / FusionCache

FusionCache is an easy to use, fast and robust hybrid cache with advanced resiliency features.
MIT License
1.87k stars 96 forks source link

[BUG] High CPU usage according to Code Optimizations on Azure #308

Closed gercobrandwijk closed 1 month ago

gercobrandwijk commented 2 months ago

I am using FusionCache in my .NET 8 application which runs on Azure App Services which is using Redis as the distributed cache as well as for the backplane. The automatic Azure Code Optimizations Profiler analyzer which I have enabled there, gives the following 'Performance issues':

  1. Activity.NotifyError is causing high CPU usage ntdll!RtlDispatchException ntdll!KiUserExceptionDispatch kernelbase!RaiseException coreclr!RaiseTheExceptionInternalOnly coreclr!IL_Throw System.Diagnostics.DiagnosticSource.il!System.Diagnostics.Activity.NotifyError(class System.Exception) ziggycreatures.fusioncache!ZiggyCreatures.Caching.Fusion.Internals.Backplane.BackplaneAccessor+<HandleIncomingMessageAsync>d__15.MoveNext() System.Private.CoreLib.il!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start(!!0&) ziggycreatures.fusioncache!ZiggyCreatures.Caching.Fusion.Internals.Backplane.BackplaneAccessor+<>c__DisplayClass14_0+<<HandleIncomingMessage>b__0>d.MoveNext() ziggycreatures.fusioncache!ZiggyCreatures.Caching.Fusion.Internals.Backplane.BackplaneAccessor+<>c__DisplayClass14_0.<HandleIncomingMessage>b__0() System.Private.CoreLib.il!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading.Thread,class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object) System.Private.CoreLib.il!System.Threading.Tasks.Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class System.Threading.Thread) System.Private.CoreLib.il!System.Threading.ThreadPoolWorkQueue.Dispatch() System.Private.CoreLib.il!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() coreclr!CallDescrWorkerInternal

image

  1. Too Much Exception Handling ntdll!RtlpExecuteHandlerForException ntdll!RtlDispatchException ntdll!KiUserExceptionDispatch kernelbase!RaiseException coreclr!RaiseTheExceptionInternalOnly coreclr!IL_Throw System.Diagnostics.DiagnosticSource.il!System.Diagnostics.Activity.NotifyError(class System.Exception) ziggycreatures.fusioncache!ZiggyCreatures.Caching.Fusion.Internals.Backplane.BackplaneAccessor+<HandleIncomingMessageAsync>d__15.MoveNext() System.Private.CoreLib.il!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start(!!0&) System.Private.CoreLib.il!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start(!!0&) ziggycreatures.fusioncache!ZiggyCreatures.Caching.Fusion.Internals.Backplane.BackplaneAccessor+<>c__DisplayClass14_0.<HandleIncomingMessage>b__0() System.Private.CoreLib.il!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading.Thread,class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object) System.Private.CoreLib.il!System.Threading.Tasks.Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class System.Threading.Thread) System.Private.CoreLib.il!System.Threading.ThreadPoolWorkQueue.Dispatch() System.Private.CoreLib.il!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() coreclr!CallDescrWorkerInternal

image

  1. Too Much Exception Handling coreclr!ClrUnwindEx coreclr!ProcessCLRException ntdll!RtlpExecuteHandlerForException ntdll!RtlDispatchException ntdll!KiUserExceptionDispatch kernelbase!RaiseException coreclr!RaiseTheExceptionInternalOnly coreclr!IL_Throw system.diagnostics.diagnosticsource.il!System.Diagnostics.Activity.NotifyError(class System.Exception) system.private.corelib.il!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start(!!0&) ziggycreatures.fusioncache!ZiggyCreatures.Caching.Fusion.Internals.Backplane.BackplaneAccessor+<>c__DisplayClass14_0+<<HandleIncomingMessage>b__0>d.MoveNext() system.private.corelib.il!System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start(!!0&) ziggycreatures.fusioncache!ZiggyCreatures.Caching.Fusion.Internals.Backplane.BackplaneAccessor+<>c__DisplayClass14_0.<HandleIncomingMessage>b__0() system.private.corelib.il!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading.Thread,class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object) system.private.corelib.il!System.Threading.Tasks.Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class System.Threading.Thread) system.private.corelib.il!System.Threading.ThreadPoolWorkQueue.Dispatch() system.private.corelib.il!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() coreclr!CallDescrWorkerInternal

image

What is happening here? Is this something to be concerned about? Can I do something to fix/resolve this?

Versions

I've encountered this issue on:

jodydonetti commented 1 month ago

Hi @gercobrandwijk , this is in fact strange, and it seems to be related to the OpenTelemetry usage. Are you using it? Now I'm on vacation ✈️, but when I'll get back I can take a look at it more. Will update, meanwhile if you have more info (setup, config, etc) that may help.Thanks!

gercobrandwijk commented 1 month ago

No, I am not using OpenTelemetry (not configured anything and ZiggyCreatures.FusionCache.OpenTelemetry is not installed).

I am using Serilog, so since FusionCache has support for ILogger, probably it is doing some logging automatically?

The config I use is as follows: image

jodydonetti commented 1 month ago

Hi @gercobrandwijk , I'm back from my vacations.

After looking better at your original issue, I'm wondering what is the difference between point 2 and 3, they have a different "prefix" in the detailed error but they are basically the same, and the reason is highlighted as:

image

and this means that there are exceptions being thrown in the BackplaneAccessor, which in turns means that your current backplane implementation (I suppose Redis, am I right?) is having some issues. Meaning, the Redis instance itself. Maybe is not sized correctly? Maybe you are encountering the famous conveyor-belt problem with some bigger-than-normal commands being sent to it?

Did you find something in the logs, like a higer-than-normal number of errors related to the backplane?

For point number 1 instead: as you can see "Activity.NotifyError" is what is called underneath in points 2 and 3 (and in fact the % are the same):

image

To me that is a sign that the same problem is being reported multiple times, like imagine Foo() calling Bar() calling Baz(), and Baz() uses 3% of cpu-time. Of course also Foo() and Bar() would take the same time, too.

What is happening here?

As said I think a possible cause may be the Redis instance having some issues, maybe.

Is this something to be concerned about? Can I do something to fix/resolve this?

I would say no, but would also suggest looking at the logs: are there a lot of warnings/errors there?

Hope this helps, let me know.

gercobrandwijk commented 1 month ago

Most likely they are indeed the same thing, so we can focus now just on the Activity.NotifyError indeed. We are indeed using Redis, both for distributed cache as for the backplane.

As mentioned we are using Serilog (which is linked to Application Insights), where we use the default configuration, which means that the default FusionCache log levels are used, which should then show traces I would suggest? Just queried traces of ApplicationInsights, and not finding anything related to FusionCache. Can you help me a bit more with the exact configuration of this?

jodydonetti commented 1 month ago

As mentioned we are using Serilog (which is linked to Application Insights), where we use the default configuration, which means that the default FusionCache log levels are used, which should then show traces I would suggest?

One quick thing to be sure we are on the same page though.

One thing is the config for Serilog/.NET that is the so called min log level, meaning the minimum log level from which a logger/sink would write to the logs and for which "category" (I'm talking about this).

Another thing is the FusionCache log level config, meaning which log levels to use for certain things like factory errors, etc (I'm talking about this).

Having said that, FusionCache normally uses Error/Warning for most "problems", so if your min log level is set to at least Warning and you haven't changed the log levels used in the FusionCache config then yeah, you should see some of them in the logs.

Just queried traces of ApplicationInsights, and not finding anything related to FusionCache.

You mean anything at all? I'm asking because maybe the problem is that you have in fact not enabled logging for FusionCache, meaning for the category "ZiggyCreatures.Caching.Fusion", like here (just an example from the docs):

image

Finally just to be clear: all of this is not specific to FusionCache, but is a general .NET thing.

Let me know!

gercobrandwijk commented 1 month ago

Thanks for the information, I will investigate this a bit more. For now we can close this one.