Open RamjotSingh opened 2 years ago
I'm very surprised by a queue of that length. Does the tooling indicate which field this is rooted by? I'm presuming that this is the "sent, pending result" queue, but: that really shouldn't be 2M long! We could perhaps put a max size on this backlog queue. Any idea what was happening more holistically here? High throughput? Disconnect? Clues...
On Fri, 18 Nov 2022, 20:28 Ramjot Singh, @.***> wrote:
We have been having a few instances of our machines low on memory so we were doing a bunch of investigations. In one of the investigations, something peculiar showed up which I could not make a whole lot of sense of.
Here is what dotMemory shows for a specific memory dump.
The string problem we have here, we are aware of it and working on fixing (it doesn't concern caching). The concerning part is 4.04 GB retained by ConnectionMultiplexer
[image: image] https://user-images.githubusercontent.com/13517857/202769497-fee8a4e5-57c8-41cf-98de-3761d576096a.png
I tried looking deeper into it. It looks like the Redis SDK holds a long Queue of messages
[image: image] https://user-images.githubusercontent.com/13517857/202768673-d97f9df4-e0ea-4e9e-930e-d67f7c68e2d5.png
Wanted to understand if this is expected or are we seeing\hitting a bug somewhere, either in the way we are setup OR the way SDK behaves. Also how to interpret this retained memory snapshot in general.
We are using .Net Core 3.1.x Redis caching extension lib which uses StackExchangeRedis 2.0.593
— Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/2307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMHEGQ6QQCMKXMLVRF3WI7RHFANCNFSM6AAAAAASE3IKB4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I am not sure honestly and kinda stumped. I do have access to the individual values too (since this is coming from memory dump). anything in particular I could look for? I had a random hunch that maybe the Value of the fetched key was pushed to LOH but that isnt the case either (confirmed in code too it should never go to LOH).
Is there any chance you have a timeout error when this happens? The text in that might help us advise quickly.
I looked a bit deeper and don't really see any timeouts. I am assuming that I need to look in a window of days. I see few failures.
The general trend is No connection is available to service this operation.
Sample error
Exception 'No connection is available to service this operation: HMGET <RedactedKey>; IOCP: (Busy=1,Free=999,Min=400,Max=1000), WORKER: (Busy=183,Free=32584,Min=400,Max=32767), Local-CPU: n/a'. StackTrace 'StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: HMGET <RedactedKey>; IOCP: (Busy=1,Free=999,Min=400,Max=1000), WORKER: (Busy=183,Free=32584,Min=400,Max=32767), Local-CPU: n/a',
@RamjotSingh That 183 worker threads busy indicated thread pool overload, which would result in a ton of backup happening. Are you doing synchronous calls overall? (I'm about to add a new stat for this). In any case: I can tell from the info here this is a very old version of the library - you'll want to upgrade here because a lot of changes have been made and we can't really solve anything for an old version. The debug info available will be far greater as well.
We run at a rather high requests per second so 183 worker threads isn't that a high a value for us (our min limit is at 400).
Regarding the SDK yes as mentioned we don't directly pull StackExchange.Redis but pull it through NetCore 3.1.x.
That being said if we were to upgrade from 2.0.593 to the latest one, should we anticipate or be on lookout for any breaking changes?
@RamjotSingh can you quantify "rather high" here? What numbers are we talking on the machine, and how many cores?
To answer your other question: there are no breaking changes known - we try to follow SemVer here.
These are typically 16 core machines taking abt 400-500 requests in parallel on a machine. But machine is running a lot more services (and doing a lot of processing too). We can approx that a particular machine running this particular process would be at abt 100 requests per second.
I am upgrading us to latest SDK version.
@RamjotSingh Your thread count is extremely high, likely due to either too much load on the CPU in general or synchronous calls that are stalled waiting on a lock or an I/O operation to complete. I'd recommend using your memory dump to see where all those threads are in their stacks. This looks like a pile-up from every standpoint.
I looked at the memory dump and only 13 threads were active (our min is set to 400 so .net wont clean them up). I did not see any other stalling etc. We are upgrading the package to latest because we saw this same thing happen on a bunch of memory dumps (not always but multiple cases where GBs worth of references are kept).
This might be related to #2070. Have you looked at a memory dump to see if you have the a large amount of RawResult[] objects taking up live and dead memory?
hello @NickCraver
I've encountered the same problem because I was using StackExchange.Redis.Extensions.Core
library which is using IConnectionMultiplexer
and creates connections pool.
Now I was able to reproduce the problem with a very simple app where directly created 25 instances of IConnectionMultiplexer
and stored them in a static list:
var conn = ConnectionMultiplexer.Connect("localhost");
then I've started making requests using each connection:
for (int i = 0; i < 1000; i++)
{
var db = connections[ i % DbCounts].GetDatabase();
db.HashSet("test", $"value-{i}", i);
var result = db.HashGet("test", $"value-{i}");
}
Collected the dump after executed above code few times:
Statistics:
MT Count TotalSize Class Name
7ffe3f730990 400 91,760,000 StackExchange.Redis.RawResult[]
Total 400 objects, 91,760,000 bytes
Here is the gcroot for one of the StackExchange.Redis.RawResult[]
objects:
> gcroot 01e1e6771498
HandleTable:
000001dbb5f511f8 (strong handle)
-> 01e236311018 System.Object[]
-> 01dfb6490c78 System.Collections.Generic.List<StackExchange.Redis.IConnectionMultiplexer> (static variable: System.String[].Summaries)
-> 01dff6422598 StackExchange.Redis.IConnectionMultiplexer[]
-> 01df765891b8 StackExchange.Redis.ConnectionMultiplexer
-> 01df76589330 System.Collections.Hashtable
-> 01df76589378 System.Collections.Hashtable+bucket[]
-> 01df765896b8 StackExchange.Redis.ServerEndPoint
-> 01df76589960 StackExchange.Redis.PhysicalBridge
-> 01df76589ef0 StackExchange.Redis.PhysicalConnection
-> 01df76589fb8 Pipelines.Sockets.Unofficial.Arenas.Arena<StackExchange.Redis.RawResult>
-> 01e1e6771498 StackExchange.Redis.RawResult[]
Thread 100dc:
8c5fff640 7ffe9bcfbccf System.Threading.PortableThreadPool+GateThread.GateThreadStart() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.GateThread.cs @ 54]
rbp-98: 00000008c5fff6e8
-> 01e236311018 System.Object[]
-> 01dfb6490c78 System.Collections.Generic.List<StackExchange.Redis.IConnectionMultiplexer> (static variable: System.String[].Summaries)
-> 01dff6422598 StackExchange.Redis.IConnectionMultiplexer[]
-> 01df765891b8 StackExchange.Redis.ConnectionMultiplexer
-> 01df76589330 System.Collections.Hashtable
-> 01df76589378 System.Collections.Hashtable+bucket[]
-> 01df765896b8 StackExchange.Redis.ServerEndPoint
-> 01df76589960 StackExchange.Redis.PhysicalBridge
-> 01df76589ef0 StackExchange.Redis.PhysicalConnection
-> 01df76589fb8 Pipelines.Sockets.Unofficial.Arenas.Arena<StackExchange.Redis.RawResult>
-> 01e1e6771498 StackExchange.Redis.RawResult[]
Found 2 unique roots.
honestly I cannot tell that's the correct way of using multiplexing but I was using that lib to address some client timeouts which occurred otherwise.
Regards,
Are we saying with 25 multiplexers we have a total of 400 objects totaling 91.7 MB? Depending on what's in your hash, and what's in progress, that doesn't seem excessive (whereas gigabytes would raise eyebrows).
Generally I wouldn't suggest that many multiplexed connections, only a few is what most use cases would top out at the usefulness of.
We have been having a few instances of our machines low on memory so we were doing a bunch of investigations. In one of the investigations, something peculiar showed up which I could not make a whole lot of sense of.
Here is what dotMemory shows for a specific memory dump.
The string problem we have here, we are aware of it and working on fixing (it doesn't concern caching). The concerning part is 4.04 GB retained by ConnectionMultiplexer
I tried looking deeper into it. It looks like the Redis SDK holds a long Queue of messages
Wanted to understand if this is expected or are we seeing\hitting a bug somewhere, either in the way we are setup OR the way SDK behaves. Also how to interpret this retained memory snapshot in general.
We are using .Net Core 3.1.x Redis caching extension lib which uses StackExchangeRedis 2.0.593