StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.91k stars 1.51k forks source link

What's the possible reason leading difference peak in Server Load, Processor Time and CPU usage? #2733

Closed maomaomqiu closed 5 months ago

maomaomqiu commented 5 months ago

Hi all, greatly thanks for many of previous answers, I am now working on a purge task to clear persistent redis region, and when I trigger task, I notice there exist difference in server load, processor time and CPU usage in different region.

Region A

Configuration

configuration Premium 26 GB (2 × 13 GB)

Dashboards

Volume

Condition

Region B

Configuration

configuration Premium 26 GB (2 × 13 GB)

Dashboards

Volume

Condition

Purge Task Logic

// List of keys pattern that need get from redis
string[] patterns;

 using (ConnectionMultiplexer connection = await ConnectionMultiplexer.ConnectAsync(config))
 {
    // get server
    Iserver server = connection.GetServers().// filter server and check server logic, then get a serer
    List<Task> tasks = new List<Task>(patterns.Length);

    foreach (var pattern in patterns)
    {
        tasks.Add(RedisPersistentKeyPurgeAction(connection, server, pattern));
    }
    await Task.WhenAll(tasks).ConfigureAwait(false);
}

private async Task RedisPersistentKeyPurgeAction(ConnectionMultiplexer connection, IServer server, string pattern)
{ 
     // batchExpire keys from redis
    List<string> batchExpireBuffer = new List<string>(50);

    var db = connection.GetDatabase();
    await using var keys = server.KeysAsync(pattern: pattern).GetAsyncEnumerator();
    bool isLastkey = !await keys.MoveNextAsync();

    while (! isLastKey) 
    {
        // every proccessed 100 keys, there will be a sleep
        await Task.Delay(200);

        // every 50 keys or current has reached last key matched, then batch set default time to live to those persistent keys
        if (statisfy some condtion)
         {
             await BatchSetExpiry(batchExpireBuffer, expiry, db);
             batchExpireBuffer.Clear();
          }
         // if it length of batchExpireBuffer < 50 
         if (batchExpireBuffer.Count < 50)
         {
             batchExpireBuffer.Add(keys.Current.ToString());
         }
         // other logic of iterator
         ....
    }
}

 private Task BatchSetExpiry(List<string> setExpiryList, int expiry, IDatabase db)
 {
         IBatch batch = db.CreateBatch();

         foreach (var key in setExpiryList)
         {
            // expiry is default expire time, 12 hours
             batch.KeyExpireAsync(key, TimeSpan.FromSeconds(expiry));
         }

         batch.Execute();
       // omit other logic, e.g. exception handling
        return Task.CompletedTask;
 }

I wonder, do you have any ideas why cause difference?

NickCraver commented 5 months ago

It seems like this is a server-side question really, not a client one. Is the client doing anything incorrect here? I'm reading your question as "why didn't the first server have the same impact?" - there are many reasons if that's the case from shard counts to SKU sizes, etc. - we can't really speak to server impact here because that's side widely variable depending on the hosting setup, replication, latency, concurrent load, etc.

If you can repro bad load patterns, it'd be best to engage the hosting team here to pose that question. If I'm missing a client-side question though: please clarify, happy to answer.

maomaomqiu commented 5 months ago

Thanks @NickCraver , I can repro. Bad load pattern - it is tolerable, due to peak is only in 1 minutes. I just wonder the possible root cause, so that if further similar operation needed, I can avoid peak

maomaomqiu commented 5 months ago

image

Each peak represents a trigger

PS: I found the peak have nothing to do with the total key amount matched

maomaomqiu commented 5 months ago

Hi @NickCraver , could you provide some ways to engage hosting team? Greatly thanks in advance!

philon-msft commented 5 months ago

@maomaomqiu Please engage support via 'Support + Troubleshoot' on the cache in the Portal

maomaomqiu commented 5 months ago

Thanks for reply, you mean azure portal? @philon-msft