dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.84k stars 4.62k forks source link

ConsoleLifetime doesn't allow full graceful shutdown for ProcessExit #35990

Closed davidmatson closed 3 years ago

davidmatson commented 5 years ago

Describe the bug

When ConsoleLifetime shuts down via the AppDomain.ProcessExit path, it skips a number of steps done in normal graceful program termination.

Related to dotnet/extensions#1363, but perhaps the opposite problem (in this case, it shuts down too soon rather than hanging/too late).

To Reproduce

Program.cs:

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using System;
using System.Threading;

class Program
{
    static int Main()
    {
        var builder = new HostBuilder()
            .ConfigureLogging(l =>
            {
                l.AddConsole();
                l.Services.AddSingleton<ILoggerProvider, CustomLoggerProvider>();
            }).ConfigureServices(s =>
            {
                s.AddSingleton<OtherService>();
            });

        using (IHost host = builder.Build())
        {
            host.Services.GetRequiredService<OtherService>();
            host.Start();
            // Edit: ignore Environment.Exit; instead use CTRL+C or docker stop here, per comments below.
            //Thread thread = new Thread(() => Environment.Exit(456));
            //thread.Start();
            host.WaitForShutdown();
            Console.WriteLine("Ran cleanup code inside using host block.");
        }

        Console.WriteLine("Ran cleanup code outside using host block.");
        Console.WriteLine("Returning exit code from Program.Main.");
        return 123;
    }

    class CustomLoggerProvider : ILoggerProvider
    {
        public ILogger CreateLogger(string categoryName)
        {
            return NullLogger.Instance;
        }

        public void Dispose()
        {
            Console.WriteLine("Ran logger cleanup code.");
        }
    }

    class OtherService : IDisposable
    {
        public void Dispose()
        {
            Console.WriteLine("Ran most service's cleanup code.");
        }
    }
}

App.csproj:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.2</TargetFramework>
    <RuntimeIdentifier>win-x64</RuntimeIdentifier>
    <SelfContained>false</SelfContained>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Microsoft.Extensions.Hosting" Version="2.2.0" />
    <PackageReference Include="Microsoft.Extensions.Logging.Console" Version="2.2.0" />
  </ItemGroup>
</Project>
  1. Run the code above as-is, and terminate with Ctrl+C. See all the cleanup code that runs.
  2. Uncomment the two Thread lines above to let the program self-terminate via Process.Exit.

Expected behavior

The same cleanup code runs when exiting via ProcessExit as it did with Ctrl+C:

Ran cleanup code inside using host block.
Ran most service's cleanup code.
Ran logger cleanup code.
Ran cleanup code outside using host block.
Returning exit code from Program.Main.

X:\path\to\App.exe (process nnn) exited with code 123.

Actual behavior

Only the following cleanup code runs:

Ran cleanup code inside using host block.
Ran most service's cleanup code.

X:\path\to\App.exe (process nnn) exited with code 456.

Note that the return code for a normal ProcessExit case (rather than one simulated by Environment.Exit) would likely be 3221225786 (STATUS_CONTROL_C_EXIT) on Windows for v2.2.0 of this package and 0 for the latest (I think; based on ConsoleLifetime source).

Additional context

Trapping ProcessExit and then deciding how long to wait overall is rather problematic; it could wait too long/block, as in dotnet/extensions#1363, or too short, as here. The main options I can think of are:

  1. Only have ProcessExit trigger the start of graceful shutdown (like Ctrl+C does), and give something else the responsibility to keep the shutdown from completing until Program.Main finishes.
  2. Have some automatic way to know when Program.Main returns. Not sure if this is possible - I tried doing a thread.Join on the main thread, but that doesn't quite work. Polling it for when IsBackground flips to false looks like it could work, but that seems not exactly ideal.
  3. Push the control up to the user for when to have the shutdown handler terminate (have the user pass in an event signaled from Program.Main maybe or something roughly like that; more complicated API to make that work).

Overall, the host itself doesn't own the entirety of Program.Main, so having it decide when to stop running is tricky.

I did find that at least the logger cleanup can likely be done better by changing the order of the disposables in the service container, including by adding a first constructor parameter to Host that gets registered first - currently, the IHostApplicationLifetime is after loggers in the list of things to dispose, and the logic to dispose in reverse order on the service provider/scope means loggers don't get to cleanup.

For application insights, for example, having the logger not get to have Dispose called means it can't call Flush, so logs currently get lost on shutdown.

The full list of things that currently get lost are:

  1. Earlier registered service provider disposables, including logger providers (likely could be fixed with some registration order/constructor chaining/dependency order changes).
  2. Code after the using host block.
  3. Program.Main's exit code.
davidmatson commented 5 years ago

A couple other things that likely currently get lost: full disposing of custom service providers, full disposing of custom hosts.

I ended up working around this with a wrapper around the entry point for now. Here's what it looks like:

class Program
{
    static void Main()
    {
        ProcessLifetimeHost.Run(MainCore);
    }

    static void MainCore(IProcessLifetime processLifetime)
    {
        new HostBuilder()
            // Configure etc
            .UseProcessLifetime(processLifetime)
            .Build()
            .Run();
    }
}

When run with the original set of things hooked up above, all of the cleanup code runs (including code outside the using block and setting the exit code).

The IProcessLifetime interface looks like this:

/// <summary>Defines a process lifetime.</summary>
interface IProcessLifetime
{
    /// <summary>
    /// Gets or sets the action to invoke to initiate graceful shutdown of currently-running application, if any.
    /// </summary>
    Action StopApplication { get; set; }
}

If you're interested in going down this route further, let me know and I can provide the other details as well (UseProcessLifetime extension method and ProcessLifetimeHost.Run method).

analogrelay commented 5 years ago

We'll likely need some corefx support on this one to give us an API to properly handle SIGTERM.

The rough proposal we have right now is to have a new API in corefx to hook the SIGTERM directly (right now we hook ProcessExit which can happen gracefully or via SIGTERM). In the new API, there would be a Cancel argument that can be used to stop the default behavior of the CLR (to shut down). In our lifetime we can hook that new event, stop the termination and signal the IApplicationLifetime to stop.

It means that if the app doesn't gracefully shutdown, SIGTERM won't shut it down, but SIGTERM is intended to be "graceful" so that makes sense. If a user needs to forcibly terminate the app, they can use SIGKILL

davidmatson commented 5 years ago

Unfortunately, I don't think it's possible for corefx to provide a cancellable API on Windows - the underlying signal sent by docker stop on Windows containers changed to CTRL_SHUTDOWN_EVENT, which can't be canceled. The MSDN docs could be a little clearer - they more say what can be done in this case (inline cleanup in the handler) rather than what can't be done (canceling the signal). See also a related moby issue for details.

analogrelay commented 5 years ago

Good to know. I was referring to graceful termination on Linux which sends SIGTERM but it may be more challenging on Windows.

This will be a tricky one, we'll have to investigate some options.

MikhaelSorkin commented 4 years ago

Hello! Any update here? Is it possible to achieve graceful shutdown in windows container?

davidmatson commented 4 years ago

It is possible to achieve graceful shutdown in a Windows container, but it currently requires a bit of custom code to make it work. Specifically, it requires using P/Invoke to get a shutdown notification (see dotnet/extensions#2029 and moby/moby#25982). And then this signal cannot be canceled, so it requires different wiring to make it work with IHostBuilder/Host. The ProcessLifetimeHost snippet above shows how I'm currently doing that from a design standpoint (the implementation is rather complicated).

So yes it's possible, but it requires custom code and is not easy.

MikhaelSorkin commented 4 years ago

@davidmatson Thank you for answer. Do you plan to publish your implementation? As a nuget package or as a source code?

davidmatson commented 4 years ago

My hope with this bug was to get a solution into the official product, so I wasn't planning on publishing my implementation. I'd have to check, but I could probably paste it here as a temporary workaround if there's interest.

MikhaelSorkin commented 4 years ago

@davidmatson Well, i'm very interested in this workaround :)

davidmatson commented 4 years ago

Here's a zip of the source I'm using as a workaround. The usage is as described above. I'm using it with v2.2.0 of the hosting extensions package (it may work with a later version but I haven't tested that).

ProcessLifetimeWorkaround.zip

Disclaimer: as-is; just a workaround; not an official solution; not offering support for this code; etc.

Tragetaschen commented 4 years ago

This has cost me the better part of the day :-(

Whenever I ran my application on the command line on Linux, CTRL+C would gracefully exit and dispose my Autofac container. When running as systemd service, that would just start the disposing work, but then stop in the middle of it. Since that service would normally only be stopped when my embedded instrument with a volatile system log would be shut down, I had to unwrap a lot of layers before this became clear.

For the time being as a workaround, I have internalized the code for UseSystemd, replaced the ProcessExit with Console.CancelKeyPress and added KillSignal=SIGINT to my systemd unit.

Is there a planned path forward already? I think this should be more than "Backlog"…

analogrelay commented 4 years ago

Triage: The right approach here is twofold:

  1. Get an event that runs before ProcessExit to capture SIGTERM (either by a full signal handler API or a SIGTERM) and gives the option to cancel the default behavior (terminating the process)

  2. Change ConsoleLifetime to hook that event and use it to signal a graceful shutdown.

analogrelay commented 4 years ago

The above approach would also fix https://github.com/dotnet/extensions/issues/1363

davidmatson commented 4 years ago

@anurse - the equivalent of SIGTERM on Windows/Docker (CTRL_SHUTDOWN_EVENT) can't be canceled. Thoughts?

tmds commented 4 years ago

Sharing my view of https://github.com/dotnet/extensions/issues/1363#issuecomment-546861225.

ConsoleLifetime shouldn't be used by default. Though there should be an API to Run a Host from Main that includes it.

That avoids the issue because the Host no longer assumes it gets disposed on ProcessExit. And, not all hosts should be tied to these events, like a host that runs shortly as part of a larger application.

davidfowl commented 4 years ago

The console host should be used by default when you call CreateDefaultBuilder, I'd be ok removing it when you new up the host.

That avoids the issue because the Host no longer assumes it gets disposed on ProcessExit.

It doesn't we still need to fix it.

tmds commented 4 years ago

It doesn't we still need to fix it.

Why not? The issue is caused by ConsoleLifetime right?

davidfowl commented 4 years ago

Why not? The issue is caused by ConsoleLifetime right?

Because when you call CreateDefaultBuilder the problem still exists.

Dotnet-GitSync-Bot commented 4 years ago

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

maryamariyan commented 4 years ago

Moving to 6.0 as it involves API work.

davidfowl commented 3 years ago

@davidmatson I know this issue is old but how are you exiting the process here? AFAIK there's no way to make this work without runtime changes. Ideally, we want Main to run to completion before exiting for these un-cancellable signals.

cc @jkotas @janvorli

jkotas commented 3 years ago

I know this issue is old but how are you exiting the process here?

The repro steps at the top exit the process using Environment.Exit. Does it still accurately describe the problem that this issue is tracking?

Environment.Exit calls the process exit event and exits the process. I do not see us adding more steps or any implicit waiting to this sequence. If Environment.Exit is not compatible with your app model, avoid calling it.

Tragetaschen commented 3 years ago

@jkotas

For me (https://github.com/dotnet/runtime/issues/35990#issuecomment-566582116), the issue became apparent through .NET's handling of SIGTERM and I moved to SIGINT as a workaround.

Environment.Exit is not part of my code base

davidmatson commented 3 years ago

@jkotas - it's been awhile, but I believe the Environment.Exit was just a deterministic, programmatic way of simulating what happens when someone does Ctrl+C, docker stop, or similar. (I expect there are other ways to trigger this behavior as well, like an unhandled exception from another thread.)

davidmatson commented 3 years ago

@davidfowl

@davidmatson I know this issue is old but how are you exiting the process here? AFAIK there's no way to make this work without runtime changes. Ideally, we want Main to run to completion before exiting for these un-cancellable signals.

Normally, exiting the process by Ctrl+C or docker stop. Yes, when I filed this issue, I expected runtime changes would be required (or a very different API for IHost startup, like in the workaround zip I posted above).

jkotas commented 3 years ago

the Environment.Exit was just a deterministic, programmatic way of simulating what happens when someone does Ctrl+C, docker stop, or similar

Not exactly.

Environment.Exit is low-level API that is not designed to be overridable.

Ctrl-C and similar signals exit the process by default, but that behavior can be overriden. #50527 is about making the APIs for these signals better.

This issue looks like duplicate of #50527 and can be closed. @davidfowl Do you agree?

an unhandled exception from another thread

Unhandled exception is abnormal exit. It does not follow the same path as Environment.Exit

davidmatson commented 3 years ago

Yeah, I'm not all that concerned with Environment.Exit per se as long as all the other cases (Ctrl+C, docker stop, system shutdown notification, and unhandled exceptions from other threads) are covered.

I'm guessing the other issue was filed with this kind of scenario in mind; as long as we're covering the same cases there, closing this one in favor of that one sounds fine to me.

davidmatson commented 3 years ago

@eerhardt - to clarify the status here: with the merged fix, does the repro at the top now work as in the Expected behavior section above (run all the cleanup steps listed and return an exit code of 123 from Main)?

Thanks

davidmatson commented 3 years ago

Also, does it work that way specifically on Windows?

eerhardt commented 3 years ago

does the repro at the top now work as in the Expected behavior section above (run all the cleanup steps listed and return an exit code of 123 from Main)?

No. That is unreasonable expected behavior. See https://docs.microsoft.com/en-us/dotnet/api/system.environment.exit?view=net-5.0#remarks for what to expect with Environment.Exit. Specifically:

Exit terminates an application immediately, even if other threads are running.

When calling Environment.Exit, your "Main" method doesn't complete. Also see @jkotas's comment above.

With the merged fix, Hosting no longer subscribes to ProcessExit. It was only handling ProcessExit in previous versions of .NET because it wanted to handle SIGTERM (for docker stop). In .NET 6, we can now specifically handle SIGTERM without using ProcessExit. If your app calls Environment.Exit and it requires specific clean up to happen, it is your apps responsibility for coordinating the shutdown. Hosting doesn't try to gracefully shutdown the process when Environment.Exit is called, because Environment.Exit isn't really a "graceful" way of shutting down a process. Your Main method doesn't complete, running threads are terminated, finally blocks aren't executed, etc.

In short, Environment.Exit != CTRL+C or docker stop. If your app is using Hosting, and you want to gracefully stop the host, you should be calling IHostApplicationLifetime.StopApplication().

See https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Hosting/docs/HostShutdown.md for more information.

davidmatson commented 3 years ago

@eerhardt - sorry, I should clarify ignoring Environment.Exit to trigger process stop as in jkota's comment above. If, at that point in the code, CTRL+C or docker stop happens, will it now behave as in the Expected behavior section, even on Windows?

Thanks

davidmatson commented 3 years ago

I'm particularly interested in the docker stop case on Windows - at least last time I looked, docker stop used system shutdown, which is a non-cancellable event.

eerhardt commented 3 years ago

If, at that point in the code, CTRL+C or docker stop happens, will it now behave as in the Expected behavior section, even on Windows?

Yes. When CTRL+C or docker stop is issued to the process, the Hosted process will now gracefully exit and return 123. Yes, even on Windows and in Windows containers.

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using System;
using System.Threading;

class Program
{
    static int Main()
    {
        var builder = new HostBuilder()
            .ConfigureLogging(l =>
            {
                l.AddConsole();
                l.Services.AddSingleton<ILoggerProvider, CustomLoggerProvider>();
            }).ConfigureServices(s =>
            {
                s.AddSingleton<OtherService>();
            });

        using (IHost host = builder.Build())
        {
            host.Services.GetRequiredService<OtherService>();
            host.Start();
            //Thread thread = new Thread(() => Environment.Exit(456));
            //thread.Start();
            host.WaitForShutdown();
            Console.WriteLine("Ran cleanup code inside using host block.");
        }

        Console.WriteLine("Ran cleanup code outside using host block.");
        Console.WriteLine("Returning exit code from Program.Main.");
        return 123;
    }

    class CustomLoggerProvider : ILoggerProvider
    {
        public ILogger CreateLogger(string categoryName)
        {
            return NullLogger.Instance;
        }

        public void Dispose()
        {
            Console.WriteLine("Ran logger cleanup code.");
        }
    }

    class OtherService : IDisposable
    {
        public void Dispose()
        {
            Console.WriteLine("Ran most service's cleanup code.");
        }
    }
}
eerhardt@EERHARDT5  C:\DotNetTest\Net6Hosting                                                                                  [12:43]
❯ dotnet publish -c Release -r win-x64
Microsoft (R) Build Engine version 17.0.0-preview-21359-04+d150e93ff for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  Determining projects to restore...
  Restored C:\DotNetTest\Net6Hosting\Net6Hosting.csproj (in 92 ms).
  You are using a preview version of .NET. See: https://aka.ms/dotnet-core-preview
  Net6Hosting -> C:\DotNetTest\Net6Hosting\bin\Release\net6.0\win-x64\Net6Hosting.dll
  Net6Hosting -> C:\DotNetTest\Net6Hosting\bin\Release\net6.0\win-x64\publish\
eerhardt@EERHARDT5  C:\DotNetTest\Net6Hosting                                                                                  [12:48]
❯ docker build -t net6hosting .
Sending build context to Docker daemon  676.9MB
Step 1/4 : FROM mcr.microsoft.com/dotnet/runtime:6.0.0-preview.6-nanoserver-1809
 ---> dd2e42b52e4a
Step 2/4 : COPY bin/Release/net6.0/win-x64/publish/ App/
 ---> 4f3178d5e561
Step 3/4 : WORKDIR /App
 ---> Running in 2b68fe9fe4f3
Removing intermediate container 2b68fe9fe4f3
 ---> 875cb6faef2a
Step 4/4 : ENTRYPOINT ["/App/Net6Hosting"]
 ---> Running in 46e2987db95a
Removing intermediate container 46e2987db95a
 ---> 8d4a9591c61d
Successfully built 8d4a9591c61d
Successfully tagged net6hosting:latest

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
eerhardt@EERHARDT5  C:\DotNetTest\Net6Hosting                                                                                  [12:48]
❯ docker run net6hosting
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: C:\App\
info: Microsoft.Hosting.Lifetime[0]
      Application is shutting down...
Ran cleanup code inside using host block.
Ran most service's cleanup code.
Ran logger cleanup code.
Ran cleanup code outside using host block.
Returning exit code from Program.Main.
eerhardt@EERHARDT5  C:\DotNetTest\Net6Hosting                                                                                  [12:49]
❯ $LASTEXITCODE
123

Above I ran docker stop 93c982cee9f8 from a separate command prompt after docker run.

davidmatson commented 3 years ago

Excellent. Thanks, @eerhardt!