Azure / Microsoft.Azure.StackExchangeRedis

Azure-specific wrapper for the StackExchange.Redis client library
MIT License
17 stars 14 forks source link

SocketClosed and SocketFailure error occurring after enabling MSI authentication for Azure Cache for Redis #55

Closed itshawi closed 1 month ago

itshawi commented 5 months ago

We recently updated our service to connect to Azure Cache for Redis using Managed Service Identity (MSI). Since deploying this change, we have encountered several exceptions that were not present when using the connection string authentication method.

Here are example of the exceptions:

[
    {
        "severityLevel": "Error",
        "outerId": "0",
        "message": "SocketFailure on blablabla.redis.cache.windows.net:6380/Interactive, Flushing/Faulted, last: GET, origin: ReadFromPipe, outstanding: 7, last-read: 0s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.33.41805",
        "type": "StackExchange.Redis.RedisConnectionException",
        "id": "6797119"
    }
]
[
    {
        "severityLevel": "Error",
        "outerId": "0",
        "message": "SocketClosed on blablabla.redis.cache.windows.net:6380/Interactive, Idle/MarkProcessed, last: GET, origin: ReadFromPipe, outstanding: 9, last-read: 0s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.33.41805",
        "type": "StackExchange.Redis.RedisConnectionException",
        "id": "51268408"
    }
]

Here is the snippet how our service connect to the cache.

var cred = new DefaultAzureCredential();
var configurationOptions = ConfigurationOptions.Parse(cacheHostName).ConfigureForAzureWithTokenCredentialAsync(cred).GetAwaiter().GetResult();
ConnectionMultiplexer redis = ConnectionMultiplexer.ConnectAsync(configurationOptions).GetAwaiter().GetResult();

I think its important to point out that these exceptions primarily occur during get or set operations on the cache, we do not encounter these issues when the service connects to the cache.

Any guidance or suggestions on resolving these exceptions would be greatly appreciated.

philon-msft commented 5 months ago

Do the exceptions appear continuously, or on a cadence that aligns with token lifetime? If they appear when tokens expire, you may want to try upgrading to the latest version 3.1.0 of Microsoft.Azure.StackExchangeRedis to get improved reauthentication.

itshawi commented 5 months ago

Its been 4 days since we deployed this change and we are seeing continuous errors related to the SocketClosed and SocketFailure issues. There was one window where we saw a spike RedisTimeoutException. The below graphs shows all the redis related errors we are seeing in our prod since we deployed the new authentication.

image

Please let me know if you want me to share other information :)

philon-msft commented 5 months ago

I see smaller spikes on a 24hr cadence, just after 06:00. I'd highly recommend upgrading to 3.1.0 to see if that reduces error rates.

itshawi commented 5 months ago

Sounds good. I will follow up once we consume the new package

codin-dev commented 4 months ago

I confirm that with 3.0.0 there were a lot of errors. Using the latest version, 3.1.0 most of them are gone, but I still got an error in the last 24h. Also note that I am using the StackExchange.Redis.Extensions.Core library, which is why this error actually appears 5 times in my logs (for each of the 5 active connections in the RedisConnectionPoolManager):

{
    "Timestamp": "2024-07-01T20:27:10.8384475+00:00",
    "Level": "Error",
    "MessageTemplate": "Redis connection error {FailureType}",
    "Exception": "StackExchange.Redis.RedisConnectionException: SocketClosed on blabla.redis.cache.windows.net:6380/Subscription, Idle/MarkProcessed, last: PING, origin: ReadFromPipe, outstanding: 0, last-read: 0s ago, last-write: 8s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 3 of 4 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.8.0.27420",
    "Properties": {
        "FailureType": "SocketClosed",
        "SourceContext": "StackExchange.Redis.Extensions.Core.Implementations.RedisConnectionPoolManager",
        "ThreadId": 48,
        "Application": "MyAPI"
    }
}
philon-msft commented 4 months ago

@codin-dev connections will be reset occasionally for various reasons including maintenance on the Redis cache or Azure infrastructure. Occasional occurrences of connection exceptions should be treated as normal unless StackExchange.Redis fails to automatically restore the connection, or the exceptions are appearing in significant numbers or concerning patterns.