ZiggyCreatures / FusionCache

FusionCache is an easy to use, fast and robust hybrid cache with advanced resiliency features.
MIT License
1.71k stars 90 forks source link

[BUG] Avoid logging exception on soft timeouts(?) Or separate the two timeout exceptions #267

Closed arealmaas closed 2 months ago

arealmaas commented 3 months ago

Describe the bug

A clear and concise description of what the bug is.

Our case is that we get several "false positives" in the sense that soft timeout triggers an exception although everything works as expected and the client is not affected.

It might be by design, but it ends up as an exception in our Application Insights which it not what we want.

What happens is that an item is expired, the fail safe is activated, the expired value is used, but the soft timeout triggers which leads to the exception.

In production this will most likely not be a problem as the eager refresh (set to 0.8) combined with high traffic will trigger and start an update of the cache in the background.

The issue is that right now it is hard to separate between a soft timeout and a hard timeout (which should most definitely trigger an exception). Is this by design, and is there a way for us to separate between the two timeouts or ignore the soft timeout completely?

To Reproduce

We have the following settings set in our FusionCache:


Duration = TimeSpan.FromMinutes(20),

FactorySoftTimeout = TimeSpan.FromSeconds(1),
FactoryHardTimeout = TimeSpan.FromSeconds(10),

DistributedCacheSoftTimeout = TimeSpan.FromSeconds(1),
DistributedCacheHardTimeout = TimeSpan.FromSeconds(2),
AllowBackgroundDistributedCacheOperations = true,

FailSafeMaxDuration = TimeSpan.FromHours(2),
FailSafeThrottleDuration = TimeSpan.FromSeconds(30),
IsFailSafeEnabled = true,

JitterMaxDuration = TimeSpan.FromSeconds(2),

EagerRefreshThreshold = 0.8f

Let the soft timeout trigger and the fail safe to return an expired value.

The exception we get is:

[
  {
    "timestamp [UTC]": "01/07/2024, 12:08:44.734",
    "problemId": "ZiggyCreatures.Caching.Fusion.SyntheticTimeoutException at ZiggyCreatures.Caching.Fusion.Internals.RunUtils+<RunAsyncFuncWithTimeoutAsync>d__1`1.MoveNext",
    "handledAt": "",
    "type": "ZiggyCreatures.Caching.Fusion.SyntheticTimeoutException",
    "message": "",
    "assembly": "ZiggyCreatures.FusionCache, Version=1.2.0.0, Culture=neutral, PublicKeyToken=null",
    "method": "ZiggyCreatures.Caching.Fusion.Internals.RunUtils+<RunAsyncFuncWithTimeoutAsync>d__1`1.MoveNext",
    "outerType": "ZiggyCreatures.Caching.Fusion.SyntheticTimeoutException",
    "outerMessage": "The operation has timed out.",
    "outerAssembly": "ZiggyCreatures.FusionCache, Version=1.2.0.0, Culture=neutral, PublicKeyToken=null",
    "outerMethod": "ZiggyCreatures.Caching.Fusion.Internals.RunUtils+<RunAsyncFuncWithTimeoutAsync>d__1`1.MoveNext",
    "severityLevel": 2,
    "details": [
      {
        "severityLevel": "Warning",
        "outerId": "0",
        "message": "The operation has timed out.",
        "type": "ZiggyCreatures.Caching.Fusion.SyntheticTimeoutException",
        "id": "49417901",
        "parsedStack": [
          {
            "assembly": "ZiggyCreatures.FusionCache, Version=1.2.0.0, Culture=neutral, PublicKeyToken=null",
            "method": "ZiggyCreatures.Caching.Fusion.Internals.RunUtils+<RunAsyncFuncWithTimeoutAsync>d__1`1.MoveNext",
            "level": 0,
            "line": 79,
            "fileName": "/_/src/ZiggyCreatures.FusionCache/Internals/RunUtils.cs"
          },
          {
            "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
            "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
            "level": 1,
            "line": 0
          },
          {
            "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
            "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
            "level": 2,
            "line": 0
          },
          {
            "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
            "method": "System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult",
            "level": 3,
            "line": 0
          },
          {
            "assembly": "ZiggyCreatures.FusionCache, Version=1.2.0.0, Culture=neutral, PublicKeyToken=null",
            "method": "ZiggyCreatures.Caching.Fusion.FusionCache+<GetOrSetEntryInternalAsync>d__64`1.MoveNext",
            "level": 4,
            "line": 160,
            "fileName": "/_/src/ZiggyCreatures.FusionCache/FusionCache_Async.cs"
          }
        ]
      }
    ],
    "itemType": "exception",
    "customDimensions": {
      "AspNetCoreEnvironment": "test",
      "_MS.ProcessedByMetricExtractors": "(Name:'Exceptions', Ver:'1.1')",
      "SourceContext": "ZiggyCreatures.Caching.Fusion.FusionCache",
      "RequestId": "0HN4PNSSIAHP2:00000013",
      "RequestPath": "/api/v1/serviceowner/dialogs",
      "ConnectionId": "0HN4PNSSIAHP2",
      "MessageTemplate": "FUSION [N={CacheName} I={CacheInstanceId}] (O={CacheOperationId} K={CacheKey}): a synthetic timeout occurred while calling the factory",
      "CacheKey": "OrgResourceReference",
      "CacheOperationId": "0HN4PNTHT3SQ5",
      "CacheInstanceId": "a3116906debe420d92eb7514c20e938e",
      "CacheName": "ResourceRegistry"
    }
  }
]

Expected behavior

An exception shouldn't be logged where there is expected behaviour. Either that or separate the two timeouts.

Versions

I've encountered this issue on:

jodydonetti commented 3 months ago

Hi @arealmaas , I hope you're finding FusionCache useful!

See below for the answer to the point you've highlighted, and keep in mind that "synthetic timeouts" are the thing that happens when a soft/hard timeout is hit.

Short Answer

You can configure the LogLevels used for synthetic timeouts differently for other general errors, via the FusionCacheOptions.

You can do that granularly, for different kinds of synthetic timeouts, for:

All of them have a default level of LogLevel.Warning: just set them to something super low, like LogLevel.Trace to have them disappear (or, to better say it, to only show when you really want to see everything).

You can also set different LogLevels for other generic errors, again differently for different parts, for:

Longer Answer

I'd like to give you visibility in why things are the way they are. If all is good, you may decide (like in your case) to ignore seldom synthetic timeouts, since they would be a useless background noise.

Depending on the scenario and how you've configured/tuned your timeouts though, you may instead expect them to be very rare: in that case you may prefer to know when they happen, because it may be an indication that something is going south in your datacenter or because there's a sudden boost in your service access pattern or something else.

By allowing a granular configuration for both the different parts (factory, distributed cache, backplane) and for the 2 different kinds of problems (synthetic timeouts as a result of soft/hard timeouts settings, or other general errors) FusionCache allows you to achieve that easily.

The defaults have been choosen as Warning so that by default users can know about it, and make an informed decision.

Hope this helps!

PS: you can see all the available options, both per-cache FusionCacheOptions and per-entry FusionCacheEntryOptions, with a description and more here.

arealmaas commented 2 months ago

Thank you for that detailed response @jodydonetti !

The problem in our case is that it ends up as an exception in our application insights which is then sent as an alert to us in different channels. We could for instance solve it by excluding exceptions with severity level warning. As I believe warning would be a suitable log level here.

However, is it possible to separate between a hard timeout and a soft timeout?

jodydonetti commented 2 months ago

Hi @arealmaas glad it helped!

For your question: no, it's not possible to configure it differently between soft (when there's a stale value) and hard (when there's not) timeouts.

But if a hard timeout kicks in, your application code will receive a full-fledged exception, which I suppose would be caught in your own code.

arealmaas commented 2 months ago

Ah ok. Thank you, a full-fledged exception would suffice. Thank you for the quick and detailed response.