Deadlock risk due to GetAwaiter().GetResult()

configcat / .net-sdk

ConfigCat SDK for .NET. ConfigCat is a hosted feature flag service: https://configcat.com. Manage feature toggles across frontend, backend, mobile, desktop apps. Alternative to LaunchDarkly. Management app + feature flag SDKs.

https://configcat.com/docs/sdk-reference/csharp

Other

29 stars 8 forks source link

Deadlock risk due to GetAwaiter().GetResult() #26

Closed jochenz closed 2 years ago

jochenz commented 3 years ago

See https://github.com/configcat/.net-sdk/blob/85d3a05fad5e79d2f350a4aa27721fa25a410522/src/ConfigCatClient/ConfigCatClient.cs#L147

... and it also seems completely unnecessary in the auto polling mode.

In auto polling mode, with in memory cache, you would expect the cache to be a lazy instance of a typed config (see issue #22). This would make the client in essence synchronous (with non-blocking IO in the background while polling).

Now, due to the general pattern (everything async), the synchronous GetValue method can't follow the async-all-the-way pattern and presents a risk to any applications using the client.

jochenz commented 3 years ago

My suggestion would be to make a true distinction between the types of caches / services: an IAsyncCache and an ISyncCache. And only support async-over-sync, but never sync-over-async, just throw a NotSupportedException in the latter case.

Or just remove the sync API, but that would be a shame, since the polling modes use a cache which always can be consulted syncrhonuously.

Anyway the current fake async implementation of InMemoryCache which gets surfaced as a sync method after all in GetValue, seems a bit strange.

endret commented 3 years ago

Hi @jochenz,

Thank you for the reporting. We are planning to eliminate all non-async version methods from IConfigCatClient and it will resolve your issue.

Thanks, Endre

jochenz commented 3 years ago

Hmm, that is actually a bit strange. Since (at least in the polling modes) the entire config is always cached and is available immediately for (potentially lock-free) reading from memory. The only exception is during application startup and given the quick availability of the config, a synchronuous API which supports the 99.999% use case would make perfect sense to me.

jochenz commented 3 years ago

There's a second, much more important reason, why we'd not like to lose the sync API. And that's the viral aspect of async/await: if you use it, it has to be async all the way.

One of the applications we work on, is quite a big legacy application which has roughly estimated about 80% synchronous code (no async / await). Feature flags are an important tool for us to safely perform changes to that legacy application. Losing the sync GetValue, would be very bad news, since any change where we'd need a feature flag could explode in our face.

Suppose we need to change something in a method deep down in the application. That method is synchronous and potentially has multiple usages in the application. If I need to make that method async to use a feature flag (because GetValueAsync is the only option left), I have to go async all the way from that point in the stack. Which means all consumers of that method have to become async and that all the consumers of that consumer, and so on...

A small change quickly explodes in your face if you have to make something async.

So please, do not remove GetValue, it has a big impact for us if we'd lose it.

jochenz commented 3 years ago

I've done a quick benchmark on this, to get an idea what we are talking about from the efficiency side (performance / memory). The code is replicating the async service & cache as currently used in the client and also provides an alternative lock-free sync version, which would be a better match for the polling scenarios (both autopolling and manual).

The results are:

pure performance per operation is not heavily impacted (async is 5x slower, but fast anyway)
async uses a lot more memory allocation --> many more GC collections, which impact overall application performance

See a quick screenshot of the results here (1000 GetValue / GetValueAsync calls per operation), absolute numbers are not very important, just the comparison is:

Here's the source code:

Cache Benchmarks.zip

Performance is one thing, but, as mentioned in the previous comment, the loss of GetValue would complicate our work quite a bit. We would very much prefer to have a lock-free, efficient version of GetValue instead, which should be perfectly possible for the cached scenarios (auto & manual polling).

endret commented 2 years ago

Hi @jochenz,

I re-thought your issue, and as you can see, our Sync methods are just wrap-around Async versions. Another solution to ensure better avoiding dead-lock situations is to run the logic in a separated thread. For example, Microsoft also uses this technique in AspNet Core here: UserManagerExtensions because the core functions have only asynchronous version. The internal tool AsyncHelper is nothing more than running the async method with the custom taskfactory and uses .GetAwaiter().GetResult() to get the result. The correct solution is that using the synchronous version in both place cache and httpclient.

We will take into account that you need synchronous API!

Thanks, Endre

jochenz commented 2 years ago

Hi Endre,

I would rather split the client into two different "configuations":

one for the polling setups (auto & manual), which has no async code at all on the hot evaluation path (Client -> Service -> Cache & Evaluator)
one for the lazy loading setup which could be async all the way

Both setups also only perform deserialization immediately after fetching and not on every evalutation (see issue #22).

Really, since the (deserialized) project config is so easily cacheable, there is no need for an async evaluation path. Only fetching it from the web should be using non-blocking, async IO. Adding a picture to visualize this (remark the opposite direction left and right of the lock-free cache (which could be used for the lazy loading version as well, as far as I can tell, but I didn't change the lazy polling version, since I did not take time to dive into the details of the lazy loading setup).

Config Cat Client

jochenz commented 2 years ago

Simplify the drawing a bit and make the Timer part of the PollingService ;-)

z4kn4fein commented 2 years ago

Hi @jochenz, we did some improvements on this topic, we added synchronous branches to the cache and used the synchronous variant of HttpClient where it was possible. With this, we were able to get rid of .GetAwaiter().GetResult() calls on Tasks. You can check what we did in the first place here. (We had to do some additional minor releases since then)