HangfireIO / Hangfire

An easy way to perform background job processing in .NET and .NET Core applications. No Windows Service or separate process required
https://www.hangfire.io
Other
9.43k stars 1.7k forks source link

BackgroundJobServer Dispose doesn't requeue jobs #2452

Open uler3161 opened 1 month ago

uler3161 commented 1 month ago

I have a thread on the forum at https://discuss.hangfire.io/t/job-shutdown-vs-delete/10952/3, but not getting any responses, so I decided to post here.

I'm using version 1.8.14 of Hangfire. Per the documentation on cancellation tokens here: https://docs.hangfire.io/en/latest/background-methods/using-cancellation-tokens.html, it sounds like if I shutdown a BackgroundJobServer by disposing it, all running jobs should be re-queued. However, it appears they are going into Succeeded status. I don't understand why.

The application is a Windows service. For testing purposes, I'm running from the command line rather than creating a service in Windows. Here is my code:

First, the Windows service class. My understanding is that stoppingToken is set when the service shuts down. In my case, on the console I'm using Ctrl+C to do this.

public sealed class WindowsBackgroundService(
    IServiceProvider serviceProvider,
    ILogger<WindowsBackgroundService> logger) : BackgroundService
{
    private ILogger<WindowsBackgroundService> Logger
    {
        get;
    } = logger;

    private IServiceProvider ServiceProvider
    {
        get;
    } = serviceProvider;

    protected override async Task ExecuteAsync(
        CancellationToken stoppingToken)
    {
        try
        {
            await ServiceProvider.GetRequiredService<IHangfireProcessor>()
                .RunAsync(stoppingToken);
        }
        catch (Exception ex)
        {
            Logger.LogError(ex, "{Message}", ex.Message);
            Environment.Exit(1);
        }
    }

The service class then executes the RunAsync method of the HangfireProcessor class. This is what runs the BackgroundJobServer. The stoppingToken from the service class gets passed into this method and is checked occasionally to determine if the service is shutting down. When set, it should dispose BackgroundJobServer when it exits the RunAsync method.

public class HangfireProcessor(
    ILogger<HangfireProcessor> logger) : IHangfireProcessor
{
    private ILogger<HangfireProcessor> Logger
    {
        get;
    } = logger;

    public async Task RunAsync(CancellationToken token)
    {
        var options = new BackgroundJobServerOptions
        {
            ServerTimeout = TimeSpan.FromHours(24)
        };

        var server = new BackgroundJobServer(options);
        try
        {

            Logger.LogInformation("Hangfire server started.");

            while (!token.IsCancellationRequested)
            {
                await Task.Delay(1000, token);
            }
        }
        catch (OperationCanceledException)
        {
            Logger.LogWarning("Hangfire shutdown requested.");
        }
        catch (Exception ex)
        {
            Logger.LogError("Serious error occurred with Hangfire server.");
            Logger.LogError(ex, ex.Message);
        }
        finally
        {
            Logger.LogWarning("Hangfire server stopped. Disposing");
            server.Dispose();
        }
    }
}

The RecordingTask class is the actual job that's executing. The jobCancellationToken should be the token that Hangfire provides behind the scenes when the job starts. I believe this gets toggled when the BackgroundJobServer gets disposed. When the token is toggled, this should kill the recording process that it started when the job started.

public class RecordingTask(ILogger<RecordingTask> logger, Camera camera, CancellationToken jobCancellationToken)
{
    private Camera? Camera
    {
        get;
    } = camera;

    private CancellationToken JobCancellationToken
    {
        get;
    } = jobCancellationToken;

    private ILogger<RecordingTask> Logger
    {
        get;
    } = logger;

    public void Run()
    {
        try
        {
            if (Camera == null)
            {
                throw new Exception("Camera was null");
            }

            var process = StartRecordingProcess();
            MonitorRecordingProcess(process);
        }
        catch (Exception ex)
        {
            Logger.LogError(ex, "Error running recording task: {message}", ex.Message);
            throw;
        }
    }

    private void MonitorRecordingProcess(
        Process process)
    {
        while (!JobCancellationToken.IsCancellationRequested && !process.HasExited)
        {
            Thread.Sleep(1000);
        }

        if (JobCancellationToken.IsCancellationRequested)
        {
            Logger.LogWarning("Cancellation requested. Killing process.");
            process.Kill(true);
        }

        if (process.ExitCode > 0)
        {
            throw new Exception(
                $"Unexpected end of recording. Hangfire job cancelled? {JobCancellationToken.IsCancellationRequested}. Process result: {process.ExitCode}");
        }
    }

    private Process StartRecordingProcess()
    {
        var startInfo = new ProcessStartInfo
        {
            Arguments = $"--cameraid {Camera!.ID}",
            FileName = "SecurityCameraRecorderProcess.exe"
        };

        var process = Process.Start(startInfo);
        return process
            ?? throw new Exception($"Couldn't start process {startInfo.FileName} with args {startInfo.Arguments}");
    }
}

And finally, this is what starts the job...

BackgroundJob.Enqueue(() => StartRecordingTask(camera, CancellationToken.None))

        [AutomaticRetry(Attempts = int.MaxValue, DelaysInSeconds = new[] { 15 })]
        public void StartRecordingTask(
            Camera camera,
            CancellationToken token)
        {
            ServiceProvider.GetRequiredService<IRecordingTaskFactory>()
                .Create(camera, token)
                .Run();
        }

For some reason, the job ends up in Succeeded status. I've seen some older discussions about async code, but I thought that had to do with async code being used inside the job rather than how I'm using it to start and manage the BackgroundJobServer. I can't spot anything obvious. It's almost like the server doesn't know it's shutting down when the job ends.

One very strange oddity is the logging output. I'm expecting the following:

Hangfire shutdown requested.
Hangfire server stopped. Disposing.
Cancellation requested. Killing process.

I have seen this:

Cancellation requested. Killing process.
Hangfire shutdown requested.
Hangfire server stopped. Disposing.

This:

Hangfire shutdown requested.
Hangfire server stopped. Disposing.
Cancellation requested. Killing process.

And this:

Hangfire shutdown requested.
Hangfire server stopped. Disposing.

Edit: Was wrong in what I was expecting to see, so I fixed it. Apparently sometimes the cancellation message happens in the same order, but sometimes it doesn't, and sometimes it doesn't seem to happen at all.

uler3161 commented 1 month ago

Here is some updated code:

public class RecordingTask(ILogger<RecordingTask> logger, Camera camera, CancellationToken jobCancellationToken, IHostApplicationLifetime hostApplicationLifetime)
{
    private Camera? Camera
    {
        get;
    } = camera;

    private CancellationToken JobCancellationToken
    {
        get;
    } = jobCancellationToken;

    private IHostApplicationLifetime HostApplicationLifetime
    {
        get;
    } = hostApplicationLifetime;

    private ILogger<RecordingTask> Logger
    {
        get;
    } = logger;

    public void Run()
    {
        try
        {
            if (Camera == null)
            {
                throw new Exception("Camera was null");
            }

            var process = StartRecordingProcess();
            MonitorRecordingProcess(process);
        }
        catch (Exception ex)
        {
            Logger.LogError(ex, "Error running recording task: {message}", ex.Message);
            throw;
        }
    }

    private void MonitorRecordingProcess(
        Process process)
    {
        while (!JobCancellationToken.IsCancellationRequested && !process.HasExited)
        {
            Thread.Sleep(1000);
        }

        if (JobCancellationToken.IsCancellationRequested)
        {
            process.Kill(true);
        }

        if (process.ExitCode > 0)
        {
            throw new Exception(
                $"Unexpected end of recording. Hangfire job cancelled? {JobCancellationToken.IsCancellationRequested}. Process result: {process.ExitCode}");
        }

        if (HostApplicationLifetime.ApplicationStopping.IsCancellationRequested)
        {
            throw new Exception("Application shutting down. Throwing exception so Hangfire knows to re-queue");
        }

        Logger.LogWarning("Recording stopped, presumably due to user stopping recording.");
    }

    private Process StartRecordingProcess()
    {
        var startInfo = new ProcessStartInfo
        {
            Arguments = $"--cameraid {Camera!.ID}",
            FileName = "SecurityCameraRecorderProcess.exe"
        };

        var process = Process.Start(startInfo);
        return process
            ?? throw new Exception($"Couldn't start process {startInfo.FileName} with args {startInfo.Arguments}");
    }
}

So, basically I have to check whether the application is shutting down. If it is, I need to throw an exception of some sort in order to convince Hangfire to requeue. Is that really what I have to do?