MichaCo / CacheManager

CacheManager is an open source caching abstraction layer for .NET written in C#. It supports various cache providers and implements many advanced features.
http://cachemanager.michaco.net
Apache License 2.0
2.34k stars 457 forks source link

WIP #184 add async AddAsync and RemoveAsync to ICache #249

Closed andreymir closed 2 years ago

andreymir commented 6 years ago

@MichaCo

Attempted to add AddAsync and RemoveAsync methods to ICache. Not sure what to do with .net 4.5 and lower because it does not support Task.CompletedTask and Task.FromResult. Any advice? Could we add async support only to .netstandard?

MichaCo commented 6 years ago

Task.FromResult should be available in .NET 4.5. Completed Task is just a shortcut QOL feature which returns a cached completed task. We can add our own I guess.

One more thing, I'm not sure if we really have to implement two complete code paths for sync and async. Maybe with ValueTask we can have only one? Would need some performance testing though as ValueTask can produce more allocation in some cases

andreymir commented 6 years ago

Hi @MichaCo,

Task.FromResult should be available in .NET 4.5. Completed Task is just a shortcut QOL feature which returns a cached completed task. We can add our own I guess.

Updated defined to include net4.5. So only net 4.0 does not have async support. Is it OK?

Added GetCacheItemAsync to ICache and added benchmark tests for async methods: GetSingleBenchmark and AddSingleBenchmark.

On my machine I get following results:

AddSingleBenchmark


BenchmarkDotNet=v0.10.8, OS=Windows 10.0.17134
Processor=Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), ProcessorCount=8
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
  Job-VUBPRE : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0

Runtime=Clr  LaunchCount=1  TargetCount=10  
WarmupCount=4  
Method Mean Error StdDev Median Op/s Scaled ScaledSD Rank Gen 0 Allocated
Dictionary 245.1 ns 3.064 ns 1.823 ns 245.3 ns 4,079,270.3 1.00 0.00 1 0.0439 184 B
DictionaryAsync 522.9 ns 7.752 ns 4.613 ns 521.9 ns 1,912,250.9 2.13 0.02 2 0.0725 304 B
Runtime 1,468.7 ns 62.331 ns 41.228 ns 1,463.9 ns 680,871.8 5.99 0.16 5 0.7534 3168 B
RuntimeAsync 1,856.5 ns 39.374 ns 26.044 ns 1,853.4 ns 538,644.1 7.57 0.11 6 0.7820 3288 B
MsMemory 801.9 ns 34.341 ns 22.714 ns 790.3 ns 1,247,059.1 3.27 0.09 3 0.1068 448 B
MsMemoryAsync 1,301.3 ns 18.890 ns 12.494 ns 1,297.4 ns 768,482.7 5.31 0.06 4 0.3414 1432 B
Redis 386,316.3 ns 4,980.507 ns 3,294.297 ns 386,087.5 ns 2,588.6 1,575.97 16.85 9 - 1372 B
RedisAsync 431,304.8 ns 6,662.711 ns 4,406.971 ns 430,890.5 ns 2,318.5 1,759.50 21.03 10 0.4883 3772 B
Memcached 354,375.7 ns 4,301.594 ns 2,249.820 ns 354,996.2 ns 2,821.9 1,445.66 13.26 7 2.9297 13971 B
MemcachedAsync 362,075.7 ns 5,459.093 ns 3,610.852 ns 361,915.5 ns 2,761.9 1,477.08 17.38 8 2.9297 14088 B

GetSingleBenchmark


BenchmarkDotNet=v0.10.8, OS=Windows 10.0.17134
Processor=Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), ProcessorCount=8
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
  Job-VUBPRE : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0

Runtime=Clr  LaunchCount=1  TargetCount=10  
WarmupCount=4  
Method Mean Error StdDev Median Op/s Scaled ScaledSD Rank Gen 0 Allocated
Dictionary 79.11 ns 0.5664 ns 0.3746 ns 78.97 ns 12,640,504.2 1.00 0.00 1 - 0 B
DictionaryAsync 200.59 ns 1.8190 ns 1.0825 ns 200.24 ns 4,985,332.5 2.54 0.02 2 0.0191 80 B
Runtime 299.25 ns 1.5175 ns 0.9030 ns 299.42 ns 3,341,697.1 3.78 0.02 4 0.0496 208 B
RuntimeAsync 435.37 ns 5.0081 ns 2.2236 ns 436.33 ns 2,296,887.3 5.50 0.04 6 0.0682 288 B
MsMemory 228.52 ns 3.2292 ns 2.1359 ns 227.86 ns 4,375,908.9 2.89 0.03 3 - 0 B
MsMemoryAsync 360.95 ns 1.8701 ns 1.2370 ns 360.61 ns 2,770,464.4 4.56 0.03 5 0.0191 80 B
Redis 252,553.58 ns 5,189.8029 ns 3,432.7334 ns 252,332.28 ns 3,959.6 3,192.47 43.57 7 - 2011 B
RedisAsync 252,382.93 ns 4,476.1477 ns 2,960.6947 ns 253,102.04 ns 3,962.2 3,190.31 38.26 7 0.4883 2096 B
Memcached 270,045.65 ns 5,271.5009 ns 3,486.7716 ns 270,649.05 ns 3,703.1 3,413.58 44.51 9 3.4180 14393 B
MemcachedAsync 264,300.18 ns 3,740.8318 ns 2,474.3287 ns 265,055.46 ns 3,783.6 3,340.95 33.22 8 3.4180 14473 B

For distributed cached I don't see much difference between sync and async version, but for Dictionary it is 2 times slower. It is probably because of async/await overhead.

MichaCo commented 6 years ago

Yes its find to skip .NET40, I'll probably drop support for that soon anyways.

Thanks for doing some performance testing, some interesting results actually. I think we have to work on the allocations when adding items. The async Add almost triples the bytes allocated across all the different implementations.

Haven't looked at your code changes yet. Will review it soon

andreymir commented 6 years ago

I will try to replace Task to ValueTask and will run the benchmark again. Will post my results here.

andreymir commented 6 years ago

@MichaCo

I created a separate branch and changed all Task to ValueTask. See benchmarks with ValueTask below:

AddSingleBenchmark


BenchmarkDotNet=v0.10.8, OS=Windows 10.0.17134
Processor=Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), ProcessorCount=8
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
  Job-VUBPRE : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0

Runtime=Clr  LaunchCount=1  TargetCount=10  
WarmupCount=4  
Method Mean Error StdDev Median Op/s Scaled ScaledSD Rank Gen 0 Allocated
Dictionary 247.1 ns 3.239 ns 2.142 ns 246.7 ns 4,046,226.5 1.00 0.00 1 0.0439 184 B
DictionaryAsync 537.3 ns 2.536 ns 1.509 ns 536.8 ns 1,861,273.2 2.17 0.02 2 0.0439 184 B
Runtime 1,493.3 ns 27.160 ns 16.163 ns 1,491.5 ns 669,641.8 6.04 0.08 5 0.7534 3168 B
RuntimeAsync 1,856.1 ns 9.370 ns 4.901 ns 1,857.3 ns 538,770.9 7.51 0.06 6 0.7515 3168 B
MsMemory 781.5 ns 11.407 ns 7.545 ns 778.2 ns 1,279,641.6 3.16 0.04 3 0.1068 448 B
MsMemoryAsync 1,314.3 ns 6.401 ns 3.809 ns 1,313.2 ns 760,852.3 5.32 0.05 4 0.3111 1312 B
Redis 367,380.1 ns 5,861.867 ns 3,877.262 ns 367,944.5 ns 2,722.0 1,486.60 19.23 7 - 1372 B
RedisAsync 434,222.9 ns 11,983.929 ns 7,926.628 ns 431,761.0 ns 2,303.0 1,757.08 33.66 8 0.4883 3916 B
Memcached 382,094.3 ns 42,926.427 ns 28,393.175 ns 371,250.6 ns 2,617.2 1,546.14 109.73 7 2.9297 13974 B
MemcachedAsync 388,666.3 ns 5,296.766 ns 3,503.483 ns 388,939.6 ns 2,572.9 1,572.74 18.62 7 2.9297 13972 B

GetSingleBenchmark


BenchmarkDotNet=v0.10.8, OS=Windows 10.0.17134
Processor=Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), ProcessorCount=8
Frequency=10000000 Hz, Resolution=100.0000 ns, Timer=UNKNOWN
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0
  Job-VUBPRE : Clr 4.0.30319.42000, 64bit RyuJIT-v4.7.3132.0

Runtime=Clr  LaunchCount=1  TargetCount=10  
WarmupCount=4  
Method Mean Error StdDev Median Op/s Scaled ScaledSD Rank Gen 0 Allocated
Dictionary 79.78 ns 0.3568 ns 0.2123 ns 79.68 ns 12,533,907.5 1.00 0.00 1 - 0 B
DictionaryAsync 208.47 ns 1.0048 ns 0.6646 ns 208.89 ns 4,796,959.2 2.61 0.01 2 - 0 B
Runtime 317.83 ns 4.8045 ns 3.1778 ns 318.14 ns 3,146,343.1 3.98 0.04 4 0.0496 208 B
RuntimeAsync 472.30 ns 22.5131 ns 14.8910 ns 466.07 ns 2,117,308.2 5.92 0.18 6 0.0496 208 B
MsMemory 237.90 ns 3.0780 ns 1.6099 ns 237.70 ns 4,203,440.9 2.98 0.02 3 - 0 B
MsMemoryAsync 368.84 ns 2.3832 ns 1.4182 ns 369.32 ns 2,711,211.2 4.62 0.02 5 - 0 B
Redis 249,763.77 ns 2,653.5906 ns 1,579.1098 ns 249,455.05 ns 4,003.8 3,130.54 20.24 7 - 2011 B
RedisAsync 248,816.87 ns 10,174.4389 ns 6,054.6475 ns 247,216.42 ns 4,019.0 3,118.67 71.97 7 - 2011 B
Memcached 270,554.35 ns 17,462.0237 ns 11,550.0480 ns 264,914.12 ns 3,696.1 3,391.12 137.60 8 3.4180 14392 B
MemcachedAsync 274,713.17 ns 10,972.4318 ns 7,257.5846 ns 274,374.21 ns 3,640.2 3,443.25 86.73 8 3.4180 14392 B
MichaCo commented 6 years ago

looks like we have a winner? ;)

andreymir commented 6 years ago

Yes, ValueTask looks better compared to Task. But sync vs async for inmemory cache still significantly slower. I guess we better not change sync implementation and only add new API for async.

MichaCo commented 6 years ago

yup, agree. I might have to do more testing and optimizations to see why exactly it is much slower. But for sure, the statemachine stuff produces some overhead

andreymir commented 6 years ago

@MichaCo, I added async methods to ICacheManager. Should I also include this into this pull request? The PR is quite large already so could it be better to create a new PR when this merged?

MichaCo commented 6 years ago

feel free to create another one, yup

Thanks for the great work btw! Hope I get some time soon to merge and review it properly.

cdonnellytx commented 5 years ago

Any updates to this? We have a project that could really benefit from async caching calls...

MichaCo commented 5 years ago

@cdonnellytx This might take a while until all this is totally done/done and released. It is a pretty big change and very little time atm Sorry ;)

p3365 commented 5 years ago

+1 for this change, async is very important for my organisation

sajibcefalo commented 5 years ago

+1 for this change, async will improve the performance a lot

DTTerastar commented 5 years ago

+1 we need this as well

ckarcz commented 4 years ago

any update on merging this in? seems like it’s been open for over a year.

eliaslopezgt commented 4 years ago

+1 for this change. The async/sync code that we need to create to make this work could lead to threadpool starvation, in other words, this is a great library but definitely it needs this :) IMO the additional support for the async code poses minimal risk to the library and the benefits are many.

thiagoloureiro commented 3 years ago

+1 for this change !

macchmie3 commented 2 years ago

+1, any updates on async support?