Closed phil000 closed 11 months ago
We are seeing a very similar situation. Similar value size, at times a bit higher frequency, but not more than 5 requests per second, commands like setclientaddr are the entries in the SLOWLOG and nothing out of the usual in the metrics. Our problems started about a week ago, and a week of work trying to figure it out hasn't yielded any results.
We're using Standard C2, and I can't see we're doing anything which should indicate that C2 shouldn't be more than enough. We're not doing anything different in the way we use the cache compared to the last year or so, when all has been working as expected.
We are not seeing timeouts very often, but very frequently gets/sets take seconds - while the exact same calls take very few milliseconds other times. We see no patterns.
I've raised a support issue with Microsoft, but as per usual they can't see anything wrong and 'the issue must be on the client'.
We measure performance of all gets and sets and log aggregates every hour. Performance has become worse recently. Average GET is now 7-9ms
For comparison, we have now (temporarily or permanently) removed Redis from our architecture and migrated to Cosmos DB configured more as a cache, and average GET performance is 2-3ms using point lookups for the same data.
@phil000 Do you have some more sample exceptions? And what is the lifetime of the application? What stands out a bit is the conn-sec: 519.67
but this could be random with a sample size of one. Do your connection durations look sane for your expectations, or does it look like we're reconnecting more than the application is starting?
The latest one.
RedisCacheProvider: Failed to set item. Timeout performing SETEX (5000ms),
next: GET 2:RoutingService::GeocodingResult::943 alexandra street, te awamutu, te awamutu 3800, new zealand+PROD,
inst: 0,
qu: 0,
qs: 3,
aw: False,
bw: SpinningDown,
rs: ReadAsync,
ws: Idle,
in: 0, in-pipe: 0, out-pipe: 0,
last-in: 2, cur-in: 0,
sync-ops: 1904, async-ops: 1,
serverEndpoint: ,
conn-sec: 11856.85,
aoc: 0, mc: 1/1/0,
mgr: 10 of 10 available,
clientName: ,
IOCP: (Busy=0,Free=1000,Min=1,Max=1000),
WORKER: (Busy=1,Free=32766,Min=2,Max=32767),
POOL: (Threads=4,QueuedItems=0,CompletedItems=830046),
v: 2.6.111.64013
The most recent ones (across different services): conn-sec: 11856.85 conn-sec: 11851.53 conn-sec: 1553.37 conn-sec: 11826 conn-sec: 15446 conn-sec: 15441 conn-sec: 4776.17 conn-sec: 4770.91 conn-sec: 8122.49 conn-sec: 8117.49 conn-sec: 8116.62 conn-sec: 4494.85 conn-sec: 861.21 conn-sec: 856.21 conn-sec: 856.03 conn-sec: 850.88
Our application is typically up for multiple days without restarting, so all of these look too short. At least some reconnections must be happening.
We can no longer investigate this issue as we have removed Redis from our architecture.
We've recently started receiving periodic timeout exceptions, up to about 20-30 a day.
There is no reason for it as far as we can see. Our app server load is low, and cache usage is also low.
We are using Azure Redis Standard C1. Server load is 0 - 40% and is mainly caused by replication. Memory usage is 100MB
The reader state "rs" is always ReadAsync
Any idea what could be going wrong.
e.g.
Call stack:
General
Traffic burst and thread pool configuration
Large key value
High CPU on client hosts
Network bandwidth limitation on client hosts
Long running commands