Closed galiniliev closed 6 years ago
I think I'm also seeing this as well but unable to track it down. I wish this client would just remove all sync support
I think I'm hitting this as well. I have nothing in my dmp file other than a ton of waiting empty threads and then this
> mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext) Unknown
StackExchange.Redis.dll!StackExchange.Redis.SocketManager.WriteAllQueues() Unknown
StackExchange.Redis.dll!StackExchange.Redis.SocketManager..cctor.AnonymousMethod__49_0(object context) Unknown
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state) Unknown
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Unknown
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Unknown
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Unknown
mscorlib.dll!System.Threading.ThreadHelper.ThreadStart(object obj) Unknown
@niemyjski there should be 1 per ConnectionMultiplexer
(by default), are you dealing with many connections? The manager can also be shared between many multiplexers, if you want.
Hello folks: I believe we are also hitting this issue (we are using version 1.2.0). Memory dumps of machines that have leaked memory have nothing but hundreds of threads on:
35 Id: 235c.2220 Suspend: 0 Teb: 00007ff5`ff4a0000 Unfrozen
Child - SP RetAddr Call Site
00 00000060`63f0dd48 00007ff9`e7f9ccb4 ntdll!ZwWaitForSingleObject + 0xa
01 00000060`63f0dd50 00007ff9`e7f9aec5 mswsock!SockWaitForSingleObject + 0x65
02 00000060`63f0ddd0 00007ff9`e99dffd3 mswsock!WSPSelect + 0x79c
03(Inline Function)--------`--------ws2_32!DPROVIDER::WSPSelect + 0x35
04 00000060`63f0df70 00007ff9`8436b931 ws2_32!select + 0x1db
05 00000060`63f0e060 00007ff9`8436a28e search!DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], IntPtr[], IntPtr[], TimeValue ByRef) + 0xe1
06 00000060`63f0e150 00007ff9`8436999e StackExchange_Redis!StackExchange.Redis.SocketManager.ReadImpl() + 0x7fe
07 00000060`63f0e3c0 00007ff9`e13d2d45 StackExchange_Redis!StackExchange.Redis.SocketManager.Read() + 0x5e
08 00000060`63f0e410 00007ff9`e13d2ab9 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) + 0x285
09 00000060`63f0e570 00007ff9`e13d2a97 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) + 0x9
0a 00000060`63f0e5a0 00007ff9`e1ca3a8d mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) + 0x57
0b 00000060`63f0e5f0 00007ff9`e294ab53 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart(System.Object) + 0x5d
0c 00000060`63f0e640 00007ff9`e294aa3e clr!CallDescrWorkerInternal + 0x83
0d 00000060`63f0e680 00007ff9`e294b1d2 clr!CallDescrWorkerWithHandler + 0x4a
0e 00000060`63f0e6c0 00007ff9`e2a98199 clr!MethodDescCallSite::CallTargetWorker + 0x251
0f(Inline Function)--------`--------clr!MethodDescCallSite::Call + 0xc
10 00000060`63f0e870 00007ff9`e294c481 clr!ThreadNative::KickOffThread_Worker + 0x105
11 00000060`63f0ead0 00007ff9`e294c408 clr!ManagedThreadBase_DispatchInner + 0x2d
12 00000060`63f0eb10 00007ff9`e294c379 clr!ManagedThreadBase_DispatchMiddle + 0x6c
13 00000060`63f0ec10 00007ff9`e294c4bb clr!ManagedThreadBase_DispatchOuter + 0x75
14 00000060`63f0eca0 00007ff9`e2a9807a clr!ManagedThreadBase_FullTransitionWithAD + 0x2f
15(Inline Function)--------`--------clr!ManagedThreadBase::KickOff + 0x20
16 00000060`63f0ed00 00007ff9`e2a07de6 clr!ThreadNative::KickOffThread + 0xd2
17 00000060`63f0edd0 00007ff9`e91713d2 clr!Thread::intermediateThreadProc + 0x7d
18 00000060`63f0f910 00007ff9`eb6554e4 kernel32!BaseThreadInitThunk + 0x22
19 00000060`63f0f940 00000000`00000000 ntdll!RtlUserThreadStart + 0x34
and
34 Id: 235c.1fa8 Suspend: 0 Teb: 00007ff5`ff4a2000 Unfrozen
Child - SP RetAddr Call Site
00 00000060`63ecded8 00007ff9`e8ba13ed ntdll!ZwWaitForMultipleObjects + 0xa
01 00000060`63ecdee0 00007ff9`e2b1f7da KERNELBASE!WaitForMultipleObjectsEx + 0xe1
02 00000060`63ece1c0 00007ff9`e2b1f5e6 clr!WaitForMultipleObjectsEx_SO_TOLERANT + 0x62
03(Inline Function)--------`--------clr!Thread::DoAppropriateAptStateWait + 0x45
04 00000060`63ece220 00007ff9`e2b1f3fd clr!Thread::DoAppropriateWaitWorker + 0x1d0
05 00000060`63ece320 00007ff9`e2b1f6a9 clr!Thread::DoAppropriateWait + 0x7d
06 00000060`63ece3a0 00007ff9`e2a991f3 clr!CLREventBase::WaitEx + 0xc1
07(Inline Function)--------`--------clr!CLREventBase::Wait + 0x1f
08(Inline Function)--------`--------clr!Thread::Wait + 0x1f
09 00000060`63ece430 00007ff9`e2a991c1 clr!Thread::Block + 0x27
0a 00000060`63ece460 00007ff9`e2a98f81 clr!SyncBlock::Wait + 0x170
0b(Inline Function)--------`--------clr!ObjHeader::Wait + 0x25
0c(Inline Function)--------`--------clr!Object::Wait + 0x25
0d 00000060`63ece5a0 00007ff9`84356843 clr!ObjectNative::WaitTimeout + 0xdd
0e 00000060`63ece720 00007ff9`84356625 StackExchange_Redis!StackExchange.Redis.SocketManager.WriteAllQueues() + 0x83
0f 00000060`63ece790 00007ff9`e13d2d45 StackExchange_Redis!StackExchange.Redis.SocketManager +<> c.<.cctor > b__49_0(System.Object) + 0x35
10 00000060`63ece7d0 00007ff9`e13d2ab9 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) + 0x285
11 00000060`63ece930 00007ff9`e13d2a97 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) + 0x9
12 00000060`63ece960 00007ff9`e1ca3a8d mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) + 0x57
13 00000060`63ece9b0 00007ff9`e294ab53 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart(System.Object) + 0x5d
14 00000060`63ecea00 00007ff9`e294aa3e clr!CallDescrWorkerInternal + 0x83
15 00000060`63ecea40 00007ff9`e294b1d2 clr!CallDescrWorkerWithHandler + 0x4a
16 00000060`63ecea80 00007ff9`e2a98199 clr!MethodDescCallSite::CallTargetWorker + 0x251
17(Inline Function)--------`--------clr!MethodDescCallSite::Call + 0xc
18 00000060`63ecec30 00007ff9`e294c481 clr!ThreadNative::KickOffThread_Worker + 0x105
19 00000060`63ecee90 00007ff9`e294c408 clr!ManagedThreadBase_DispatchInner + 0x2d
1a 00000060`63eceed0 00007ff9`e294c379 clr!ManagedThreadBase_DispatchMiddle + 0x6c
1b 00000060`63ecefd0 00007ff9`e294c4bb clr!ManagedThreadBase_DispatchOuter + 0x75
1c 00000060`63ecf060 00007ff9`e2a9807a clr!ManagedThreadBase_FullTransitionWithAD + 0x2f
1d(Inline Function)--------`--------clr!ManagedThreadBase::KickOff + 0x20
1e 00000060`63ecf0c0 00007ff9`e2a07de6 clr!ThreadNative::KickOffThread + 0xd2
1f 00000060`63ecf190 00007ff9`e91713d2 clr!Thread::intermediateThreadProc + 0x7d
20 00000060`63ecfc50 00007ff9`eb6554e4 kernel32!BaseThreadInitThunk + 0x22
21 00000060`63ecfc80 00000000`00000000 ntdll!RtlUserThreadStart + 0x34
As @galiniliev suggested, this seems to be a problem with the mixing of sync and async code so in an attempt to avoid this issue we have tried telling the TPL to not schedule the continuation of the "StringGetAsync(key)" task on the same thread by updating our code to call it with ".ConfigureAwait(false)"; unfortunately even after this we are still hitting this issue.
Is there tracing we could turn on to help you get to the bottom of this? Is there anything you'd recommend us to do to work around the problem?
Any help would be greatly appreciated. Thanks
I'm curious - why does everyone here think this is a sync/async issue? The thread repeatedly posted here is another thread servicing the network socket. Notice how your async method isn't in the call stack? That's because it doesn't call into synchronous code. It simply adds to the transmit queue and returns. Your await
will be fulfilled when a message returns from redis, when we complte the original task, ultimately via TryComplete()
on the original task here.
So far, all the symptoms posted simply point to a response from redis taking time. Does anyone have numbers for latency, ops/etc, etc. to their redis instance?
@GePerez
we have tried telling the TPL to not schedule the continuation of the
StringGetAsync(key)
task on the same thread by updating our code to call it with.ConfigureAwait(false)
I'd like to clarify for this and others: that's not what .ConfigureAwait(false)
does. It doesn't affect which thread it continues on, that's your continuation options (which are often just suggestions). When calling .ConfigureAwait(false)
, what you're saying is don't capture the synchronization context (to be more lightweight, having less to restore upon continuation as well).
Thank you for the quick response @NickCraver. I'm not sure I understand this statement:
It simply adds to the transmit queue and returns. Your await will be fulfilled when a message returns from redis.
Our code retrieves an item from the cache by calling
string serializedObject = await ConnectionMultiplexer.GetDatabase().StringGetAsync(key);
If the above does not somehow create a thread with the stack:
34 Id: 235c.1fa8 Suspend: 0 Teb: 00007ff5`ff4a2000 Unfrozen
Child - SP RetAddr Call Site
00 00000060`63ecded8 00007ff9`e8ba13ed ntdll!ZwWaitForMultipleObjects + 0xa
01 00000060`63ecdee0 00007ff9`e2b1f7da KERNELBASE!WaitForMultipleObjectsEx + 0xe1
02 00000060`63ece1c0 00007ff9`e2b1f5e6 clr!WaitForMultipleObjectsEx_SO_TOLERANT + 0x62
03(Inline Function)--------`--------clr!Thread::DoAppropriateAptStateWait + 0x45
04 00000060`63ece220 00007ff9`e2b1f3fd clr!Thread::DoAppropriateWaitWorker + 0x1d0
05 00000060`63ece320 00007ff9`e2b1f6a9 clr!Thread::DoAppropriateWait + 0x7d
06 00000060`63ece3a0 00007ff9`e2a991f3 clr!CLREventBase::WaitEx + 0xc1
07(Inline Function)--------`--------clr!CLREventBase::Wait + 0x1f
08(Inline Function)--------`--------clr!Thread::Wait + 0x1f
09 00000060`63ece430 00007ff9`e2a991c1 clr!Thread::Block + 0x27
0a 00000060`63ece460 00007ff9`e2a98f81 clr!SyncBlock::Wait + 0x170
0b(Inline Function)--------`--------clr!ObjHeader::Wait + 0x25
0c(Inline Function)--------`--------clr!Object::Wait + 0x25
0d 00000060`63ece5a0 00007ff9`84356843 clr!ObjectNative::WaitTimeout + 0xdd
0e 00000060`63ece720 00007ff9`84356625 StackExchange_Redis!StackExchange.Redis.SocketManager.WriteAllQueues() + 0x83
0f 00000060`63ece790 00007ff9`e13d2d45 StackExchange_Redis!StackExchange.Redis.SocketManager +<> c.<.cctor > b__49_0(System.Object) + 0x35
What usage of Redis do you recommend us to look for in our code to determine the source of all those threads?
With regards to this:
So far, all the symptoms posted simply point to a response from redis taking time
Do you have any recommendations for how to configure the Redis client side to timeout after some time?
Today, the call to StringGetAsync is wrapped around a TimeoutAfter API as described at https://blogs.msdn.microsoft.com/pfxteam/2011/11/10/crafting-a-task-timeoutafter-method
Unfortunately that does not seem to work which is why we started suspecting some sort of deadlock. Thanks a lot for your help.
@GePerez perhaps I misunderstood, if you're actually seeing hundreds of those threads, I need to ask different questions to identify the problem.
How are you creating the multiplexer(s)? Are you sharing 1 for the application, or creating many?
@NickCraver sorry for the late response.. Seeing this exact behavior.... Yes, we are creating a single multiplexer (https://github.com/exceptionless/Exceptionless/blob/master/src/Exceptionless.Insulation/Bootstrapper.cs#L47) and we are cleaning up after ourselves. We've had multiple people do code reviews on our implementations and I'm not sure what we could be doing that would be causing this but definitely an issue that I've been unable to track down further than this.
Also, it only seems to happen in production and very very rarely in dev. So it's really really hard to track this down on my side. We are using azure premium redis and I haven't seen any metrics to tell me I'm seeing redis timeouts (via redsmin)..
Closing; please see #871 (WriteAllQueues literally doesn't exist in 2.0)
Hi, we are using 1.1.608 release and occasionally we see calls to StringGetAsync() getting blocked and timing out the WebAPI calls. looking at the call stacks we see following:
StackExchange_Redis_StrongName!StackExchange.Redis.SocketManager.WriteAllQueues()+0x72 0a 000007f97a95674e : 0000000ecbfc64c0 0000000ecbfc62b0 0000000000000000 0000000000000000 : StackExchange_Redis_StrongName!StackExchange.Redis.SocketManager+<>c.<.cctor>b__49_0(System.Object)+0x38 0b 000007f97a9565e7 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x15e 0c 000007f97a9565a2 : 0000000ecbfc65b0 0000000000000000 0000000000000000 0000000000000000 : mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x17 0d 000007f97b32e24d : 000000131a8bf438 000007f97bac5552 0000000000000012 000000131a8bf588 : mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x52 0e 000007f97bac6a53 : 0000000ecbfc65d8 0000000ecbfc62b0 0000000000000000 0000000000000000 : mscorlib_ni!System.Threading.ThreadHelper.ThreadStart(System.Object)+0x5d
looking at SocketManager there is Monitor.Wait(writeQueue); which is mixing sync and async pattern. out code is fully async. Is anyone else see same problem?