Protecting the *.service.signalr.net endpoint

Is your feature request related to a problem? Please describe.

We are migrating from a ASP NET Signalr with Redis backplane service, to a dedicated ASP NET Core Signalr + Azure Signalr backend service. In this process, we have discovered how vulnerable we will be, as we cannot find a way to ensure a single user doesn't swamp all available connections on our units. As part of our testing to ensure the system works as intended, we used an adaptation of Crankier and found that, within minutes, a single person could take all 5000 connections on our example 5 unit Azure SignalR service.

In an attempt to mitigate this, we investigated using something like Azure Front Door, only to realise this doesn't support websockets and doesn't give us enough rate limit configuration. Our next step was to try and hide the service behind haproxy, allowing us to limit the maximum number of concurrent connections. However, to do this we would need to route through another domain, and we receive 401 errors when the server attempts to make a connection to the Azure SignalR service, due to the JWT being sent having a mismatched bearer token, where the audience is now set to our haproxy domain instead of *.service.signalr.net.

We are able to protect our ASP.Net Core server to prevent a user from hitting our negotiate method too many times, however once they have a connection we have no way to handle this, and there's nothing stopping the user still taking every available connection on our units, just over a longer period of time.

Describe the solution you'd like

Initially I considered being able to override the audience value in the JWT, which could be done if we had public access to the interface IServiceEndpointGenerator https://github.com/Azure/azure-signalr/blob/937a0fb58dc4bade9086aa1fb9d0eae9fab77804/src/Microsoft.Azure.SignalR/EndpointProvider/IServiceEndpointGenerator.cs#L10 and this was passed through via DI, however this would still reveal a public endpoint in the JWT to potential attackers.

We require a way to secure this service, assuming there isn't already one. A way to limit connections per client IP address would be a useful feature, being able to route successfully through a rate-limiting Azure service, or a way to tell the Azure SignalR server to expect a different audience (eg. custom domain support) to be able to also add layer 4 protection (assuming this isn't already factored in).

Otherwise, any suggestion you have for how we can prevent a malicious user from attacking our Azure SignalR service, be that a regular DDoS or a way to prevent the user from using a methodology similar to Crankier (or even simply opening thousands of browser tabs) and taking all available connections would be appreciated.

Using K6 I was able to emulate the negotiate endpoint flow and web socket connection to Azure SignalR service, publishing hundreds of thousands of messages in a period of minutes without any mitigation available. Not only does this put significant load through to our Blazor Applications but it is huge risk in terms of Denial of Wallet.

We already have Azure Front Door in place for which the initial negotiate request goes through, this does give us some level of protection for DDoS and allowing an individual to open up thousands of connections. However, as @Jonno12345 alluded to, once a connection is made there is nothing stopping that individual flooding the websocket connection indefinitely.

Were there any plans for this feature to be implemented in future? This is currently a huge blocker for us right now to be able to deploy Blazor Server applications that are supported by Azure SignalR service. It is a huge flaw which I'm surprised doesn't have more of a spotlight?

@Jonno12345 Did you find any other mitigations for this issue?

All of these messages were sent from a single K6 client, not even scratching the surface of what could potentially be sent:

1,394,441 messages received and 1,575,046 sent in 10 minutes.

Using K6 I was able to emulate the negotiate endpoint flow and web socket connection to Azure SignalR service, publishing hundreds of thousands of messages in a period of minutes without any mitigation available. Not only does this put significant load through to our Blazor Applications but it is huge risk in terms of Denial of Wallet.

We already have Azure Front Door in place for which the initial negotiate request goes through, this does give us some level of protection for DDoS and allowing an individual to open up thousands of connections. However, as @Jonno12345 alluded to, once a connection is made there is nothing stopping that individual flooding the websocket connection indefinitely.

Were there any plans for this feature to be implemented in future? This is currently a huge blocker for us right now to be able to deploy Blazor Server applications that are supported by Azure SignalR service. It is a huge flaw which I'm surprised doesn't have more of a spotlight?

@Jonno12345 Did you find any other mitigations for this issue?

All of these messages were sent from a single K6 client, not even scratching the surface of what could potentially be sent:

1,394,441 messages received and 1,575,046 sent in 10 minutes.

To some extent yes. There are a few new things since I made this issue, and we've done the following:

Disabled access key authentication, using Azure AD for server side authentication
Disabled public server connections, allowing server and rest API connections only through a private vnet. Client connections are still public
Provided a 'ClientEndpoint' in the connection string that allows us to route traffic through a different domain. We are using haproxy to relay websocket connections, I'm unsure on the Azure offerings for this but we already used an off-cloud internal solution for this and wanted to route everything the same way. This was the key part missing before however, I don't recall 'ClientEndpoint' being a parameter of the connection string before, so there was no way of routing through anything but the service.signalr.net domain. Now, the service.signalr.net domain is only used for the server connection on the application itself, and clients are never revealed this endpoint, the negotiate passes back our protected endpoint so we can rate limit and manage connections internally as we require.

In combination, this means only our deployed web app can make server side connections to Azure SignalR Service (due to Azure AD authentication as well as connections only being allowed through the VNet, which only has our server side application present), our application where negotiation occurs goes through our firewall/rate limiting server, and after negotiation the endpoint the client is given also routes through our firewall/rate limiting server, allowing us to handle this as required.

Note that we only really implemented all this in the last couple of months coincidentally, our transition to this was on hold until we found this was possible. As such we still haven't fully tested and worked out how well this works, but we've certainly made more progress than we previously had.

Unfortunately this may not be so helpful for you as it appears Azure Front Door still is yet to support websockets:

https://github.com/dotnet/aspnetcore/issues/38196 https://feedback.azure.com/d365community/idea/c8b1d257-8a26-ec11-b6e6-000d3a4f0789

@Jonno12345

Hello matey,

I sincerely appreciate you responding to my message, it's really good to see that you found a workaround to the problems you were facing.

I understand that you are using haproxy as an ingress into your Azure SignalR Service. I'm curious, how does this help in terms of rate-limiting the number of messages that can be sent through a single websocket connection? From what I can see, haproxy doesn't provide anything out of the box for doing this?

This is the main challenge we have right now, we have investigated other Azure offerings such as Application Gateway (which does support websocket protocol), but does not allow any functionality for rate-limiting an established websocket connection.

Again, thanks again for getting back to me.

Cheers, Rich

@Jonno12345

Hello matey,

I sincerely appreciate you responding to my message, it's really good to see that you found a workaround to the problems you were facing.

I understand that you are using haproxy as an ingress into your Azure SignalR Service. I'm curious, how does this help in terms of rate-limiting the number of messages that can be sent through a single websocket connection? From what I can see, haproxy doesn't provide anything out of the box for doing this?

This is the main challenge we have right now, we have investigated other Azure offerings such as Application Gateway (which does support websocket protocol), but does not allow any functionality for rate-limiting an established websocket connection.

Again, thanks again for getting back to me.

Cheers, Rich

Hi Rich,

Apologies, not entirely my area - I believe it's OpenResty for this particular implementation with modules for regulating these, but I don't think it's an out-of-the-box solution, my colleague handles the mitigation side of things. I just handled the rest of the implementation around that 😁.

I'd hope there's a more consistent approach through Azure long term, so I think this topic still stands, but there's at least some steps toward an approach at least.

Thanks,

Jonno

@r3wind

I've linked a repo below, which will provide you a great starting block (with examples), to be able to piece together a LUA module upon OpenResty (or NGINX with LUA).

https://github.com/openresty/lua-resty-websocket#synopsis https://github.com/openresty/lua-resty-websocket#restywebsocketclient

You can setup a WebSocket Server (first link), process the requests/messages and throttle/filter based on pre-set profiles, netFlow, basic rate limits or anything in between, then forward on the validated requests/messages to the origin (SignalR service).

You can utilise a basic rate limiting LUA module such as https://github.com/openresty/lua-resty-limit-traffic, allowing you to setup basic throttling on a per request and per message level.

If DDoS Floods are a day-to-day issue, then I'd take this further and create an edge firewall, allowing you to process IP blocks once you have determined an abusing IP address via the above mentioned validation methods, to ensure the flood is not overwhelming the server resources, I'd suggest for a non-distributed setup using sockets to communicate to the OS with ipset or eBPF at a kernel level.

Best of luck.

Azure / azure-signalr

Protecting the *.service.signalr.net endpoint #961

Is your feature request related to a problem? Please describe.

Describe the solution you'd like