Azure / azure-signalr

Azure SignalR Service SDK for .NET
https://aka.ms/signalr-service
MIT License
427 stars 101 forks source link

[RCA] Azure SignalR Service Control Plane Outage in North Central US [Resolved] #2053

Closed bjqian closed 1 month ago

bjqian commented 1 month ago

Summary

Azure SignalR service experienced a control plane outage in North Cental US. Control plane write operations, such as creating resources or modifying the unit count manually or by autoscaling were impacted, casuing them either to fail or hang. Read operations were unaffected.

Details

Impacted regions

North Central US

Start Time (UTC)

2024-09-25 13:00

Recover time (UTC)

2024-09-26 01:00

Root cause

A recent delpoyment in North Central US region unintentionally removed the network peering between the control plane and data plane. As a result, all the create/update operations would fail.

Mitigation step

We did a hot fix to restore the network peering. After that, control plane recoverred.