dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.41k stars 10k forks source link

SignalR connections are shutdown directly during graceful shutdowns #25069

Open WJRovers opened 4 years ago

WJRovers commented 4 years ago

Context: Hosting an application in a kubernetes environment can sometimes cause the service (with active connections) to be shutdown as autoscaling happens. We have implemented graceful shutdown behaviour for various parts of the application but I see no such feature/possibility for SignalR servers. This causes problems when using the streaming features in SignalR. The Hub interface does offer you a OnDisconnectedAsync but as it turns out this is fired after the connection has been closed (by the server itself).

Things Tried I've stepped into the source code with a debugger to find out most is handled in privates/internal classes. Should a complete new implementation be written of a HubConnectionHandler that can deal with IApplicationLifetime? Or am I missing something like an injectable class that can deal with this. Or is there an existing SignalR lifetime available? Also; Im not sure if the HubFilters that seem to be coming out in a later release would help us with our problem.

Question/Idea: We would like to await stream completion (with a max timeout) before the SignalR connection is cut off. Is it possible to interact with the IApplicationLifetime or await completion before SignalR connections are actually closed?

davidfowl commented 4 years ago

You might be able to achieve something if you handle the IHostApplicationLifetime notification before SignalR does and closes all connections. This logic actually isn't directly in the signalr layer, it's part of the transport logic.

WJRovers commented 4 years ago

I am not sure this is not possible as I could not find any interaction between the SignalR hosting service and the IHostApplicationLifetime. Ive done a little test:

Startup.cs

public void Configure(IApplicationBuilder app, IHostApplicationLifetime applicationLifetime)
{
     applicationLifetime.ApplicationStopping.Register(() =>
     {
          Console.WriteLine("App is stopping - We should wait before SignalR closes connections?");
     }, true);

    // Some more stuff here

ChatHub

public async Task StreamData(ChannelReader<string> streamOfStrings){
    try
    {
        await foreach (var receivedString in _channelReader.ReadAllAsync())
            Console.WriteLine(receivedString)
    catch (OperationCanceledException e)
    {
        Console.WriteLine(e.Message) // "The underlying connection was closed."
    }
}

When you start a Graceful shutdown during a stream of strings being send; the console will show the followiing: Console Output

The underlying connection was closed.
App is stopping

To me it looks like the SignalR service does not respect or interact with any of the IHostApplicationLifetime events. Am I missing something obvious or is this something that is not implemented?

Little extra information When you have a SignalR client with auto reconnect; and you put something like a thread sleep (for testing) in the ApplicationStopping registration. It closes the connection during graceful shutdown; and then allows the agent to reconnect.

davidfowl commented 4 years ago

What server are you running on?

BrennanConroy commented 4 years ago

So one idea would be to add a hook to allow users to run something before SignalR starts closing connections.

ghost commented 4 years ago

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

BrennanConroy commented 3 years ago

We're thinking of adding an option to delay connection closes. By default there is no delay so today as soon as shutdown is triggered connections start closing. With the option you can specify how long SignalR delays once shutdown is triggered before trying to gracefully close connections.

The server can still aggressively shutdown connections that are still alive, Kestrel has configuration for this delay, not sure about other servers.

cc @davidfowl

ghost commented 3 years ago

Thanks for contacting us. We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

cubed-it commented 3 years ago

I have exactly the same problem as the author. My singalr core application needs graceful termination for streams when scaling and rolling out an update. It is a showstopper somehow... 🙁

So I would like to support the call for such a ApplicationStopping support for signalr core hereby explicitly!

Thanks a lot! 👍

springy76 commented 3 years ago

I have the same problem with Server-Blazor (using SignalR implicitly): All open circuits got ICircuitTracker.OnConnectionDownAsync before my IHostApplicationLifetime.ApplicationStopping handler gets called. And I checked that my code attaching to ApplicationStopping was the first one adding delegates, so should be the first one called, too.

I need an earlier point which allows me to wait (async) and trigger Component.StateHasChanged().

I already have existing code (in a singleton service which monitors all circuits) which - based on a timeout - "kicks" users by first collecting state data and then redirecting them to a simple non-blazor razor page which allows them to reenter, but also allows IIS to apply idle-shutdown of the app pool. This already works like a charm, I just cannot combine this code with a forced shutdown; all clients then just receive a reconnecting-attempt with final "not possible" error - either because the server was shutdown entirely or the freshly started server can't of course re-attach dropped Blazor circuits.

The automatic re-connect of Blazor/SignalR is extremely worse in my scenario which involves hundreds of app-pools of the same application (strict tenant separation) - because in the (seldom) case of an entire IIS-reset hundreds of app-pools are starting at the same time immediately just because of the automatic retry.

cubed-it commented 3 years ago

@springy76 as stated above I'm mainly interested in signalr core graceful termination, but blazor is also such a sore spot... I've been looking for a solution for my employer how to roll out an update using kubernetes without all users being kicked out immediately... however my I ask if the source for the "singleton service which monitors all circuits" is shareable? or at least could you provide some api hints? sounds really interessting and now I want such a thing too. ;) some users like to keep tabs open pretty long..

springy76 commented 3 years ago

@cubed-it It's a relative simple service deriving from CircuitHandler: https://docs.microsoft.com/en-us/aspnet/core/blazor/fundamentals/signalr?view=aspnetcore-5.0&pivots=server#blazor-server-circuit-handler which tracks all opened and closed circuits plus an in-memory weak-referenced service bus which all app components are listening to for kick messages.

BrennanConroy commented 3 years ago

As a workaround you can register to the IHostApplicationLifetime.ApplicationStopping token after any call to MapHub or MapConnections and perform work in there before returning and letting the rest of the events fire.

app.UseEndpoint(endpoints =>
{
    endpoints.MapHub<MyHub>("/hub");
});
// cancellation tokens fire callbacks in LIFO order, so make sure we're after MapHub is called
app.ApplicationServices.GetRequiredService<IHostApplicationLifetime>().ApplicationStopping.Register(() => DoWork());
springy76 commented 3 years ago

@BrennanConroy I'll have to give that a try - calling a hard .Wait() there shouldn't be total catastrophic as only this single thread would be blocked for some seconds? (I hope the final solution will allow awaiting tasks.)

kunom commented 2 years ago

Same issue here. I would like to have SignalR being up until MassTransit is fully closed down.

MassTransit works as a IHostedService, where shutdown happens one-per-one in reverse order of registration. I was a bit surprised that Azure SignalR does not do that but directly shortcuts to IHostApplicationLifetime.

ghost commented 2 years ago

Thanks for contacting us.

We're moving this issue to the .NET 7 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

zackliu commented 2 years ago

We're facing the same issue. And the workaround is not good enough because the customized graceful shutdown is usually async method. What you have to do is "sync over async" to block the next task. We have many WebHosts instance in one process for business needs. So, the "sync over async" will block a lot of threads and easy to cause thread starving.

kunom commented 2 years ago

Not only should shutdown be happening as part of the IHostedService contracts instead of the IApplicationLifetime contract, it also should be possible to shut down individual SignalR hubs in a well-defined order.

This, at least, is our use case, and it took me a lot of time/reverse engineering/hacking/reflection to end up with a shutdown sequence of:

SignalRHubs group1 > MassTransit > SignalRHubs group2

Needless to say, as a side remark, that I don't understand why all the Azure SignalR extensions are marked as internal and thus are also extremely hard to monitor for health checking.

ghost commented 2 years ago

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

springy76 commented 9 months ago

So what's the progress of .NET 8 Planning?