Hangfire Job starts multiple times--Same issue is experienced as #590

YukaAn commented 7 years ago

I have gone through all the open issues here and found that the issue Im experiencing supposed to be solved with v1.5.8. But Im running v1.6.6 and still seeing the similar issue. So the same job will be processed multiple times randomly. I also saw issue #842 describing the same thing. Can someone help me to fix it?

I'm using Hangfire.SqlServer V1.6.6

YukaAn commented 7 years ago

some instances of this issue:

multipleprocessing

odinserj commented 7 years ago

@YukaAn, what storage are you using? Could you include the full details, e.g. the data column?

YukaAn commented 7 years ago

@odinserj I'm using sql server 2008, Hangfire version is 1.6.6. Below is a screenshot of a job state info:

odinserj commented 7 years ago

Could you show me your configuration logic and recurring method's signature?

YukaAn commented 7 years ago

@odinserj Thank you for the quick response!

The configuration logic looks like this:

public void Configuration(IAppBuilder app) {
            GlobalConfiguration.Configuration.UseSqlServerStorage("HangfireDb");
            app.UseHangfireDashboard();
            app.UseHangfireServer();
        }

the recurring methods like this:

public void CreateRecurringJob(int hour, int minute, int Id, string Name, string occurence)
        {
            try
            {
                if(!MinuteCheck(minute) || !HourCheck(hour) || !CronCheck(occurence))
                {
                    return;
                }
                string cron = BuildCron(hour, minute, occurence);
                if(IsExistingOrNewMethod(Id, Name))
                {
                    ScheduledJobHandler handler = new ScheduledJobHandler();
                    RecurringJob.AddOrUpdate(
                        JobNameBuilder(Id, Name),
                        () => handler.SendRequest(Id, Name),
                        cron,
                        TimeZoneInfo.FindSystemTimeZoneById("Eastern Standard Time")
                    );
                }
            }
            catch(Exception ex)
            {
                throw ApiException(ex);
            }
        }

YukaAn commented 7 years ago

@odinserj I have around 70 recurring jobs scheduled each day and this issue keeps happening couple times every day (randomly on different jobs). I'm still waiting for your reply and I appreciate your help. Thanks!

odinserj commented 7 years ago

@YukaAn, sorry for the delay. Try to upgrade to the latest version. At least Hangfire.Core 1.6.12 has a fix related to a problem like yours:

• Fixed – Buggy state filters may cause background job to be infinitely retried.

Looks like there's a transient exception that occur when your job is completed, and only logging could help to investigate the issue in detail. Please see this article to learn how to enable it, and feel free to post your log messages into this thread to conduct a further investigation.

pinual commented 6 years ago

I am having this same issue on Hangfire 1.6.20 and using LiteDBStorage. I have seen several other reports of this issue but no resolution. Are you still using a work around?

mattxo commented 6 years ago

Same issue here. Hangfire 1.6.20 and Hangfire.SQLite 1.4.2

public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
  ...
  RecurringJob.AddOrUpdate("debug", () => Hangfire(), Cron.Minutely);
}

public void Hangfire()
{
   Debug.WriteLine($"{DateTime.Now} - Hangfire");
}

16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire

marius-stanescu-archive360 commented 6 years ago

Hello. I have the same issue.

The job starts for every worker. If I set the server worker count to 2 then it starts 2 times, if I set it to 50 then it starts 50 times.

I use the latest Hangfire version (1.6.20) and SQLite for storage.

The job is enqueued from the web application.

BackgroundJob.Enqueue(() => StartDatabaseExport(databaseId));

The server is started from another application (windows service).

var options = new BackgroundJobServerOptions { WorkerCount = 50 };
new BackgroundJobServer(options);

Any ideas?

marius-stanescu-archive360 commented 6 years ago

I also tried with the LiteDB storage, same problem. I then tried with the in-memory storage and it works as expected. So it seems it's related to the storage.

sheburdos commented 6 years ago

Hangfire 1.6.17.0 MemoryStorage: 1.5.1.0

Tasks are created as:

IState state = new EnqueuedState(QueueName.PRIORITY); _jobClient.Create(() => TaskFactory.Build(id), state);

Sometimes jobs run multiple times. Log from within the task:

Job do access to database. So it could stuck for some time if all db connections from the pool are taken by other jobs. That what happened at the beginning I assume. Then the task was run again, but there were no reports on retry\error. And actually after that, task reported twice about successful completion (as well as about intermediate steps). Feels like the job was retried after some waiting period without canceling previous evaluation.

pauldotknopf commented 5 years ago

Although your messages should be idempotent, this should definitely be fixed in Hangfire.

Is this issue in Hangfire itself, or in the storage providers?

lukaszgatnicki commented 5 years ago

Hi, We have the same problem. We tried to find a solution and after a long time we probably managed somethink. The problem is associated with storage beyond any doubt. Mulitiple wrokers run jobs when you use LiteDB, SQLite and similar storages. Everythink is ok with SQL Server. So if it was possible to fix the error in SQL Server Storage it could be possible in others. So this is my shy request to the creators

pauldotknopf commented 5 years ago

I think I found the issue in the Sqlite provider: https://github.com/mobydi/Hangfire.Sqlite/issues/2#issuecomment-441019511

Maybe this issue is something similar for the SQL Server provider as well?

Neonkiller commented 5 years ago

Experiencing the same issue. We use SQL Server as a storage (if it matters). Any updated when it could be fixed, or at least if root cause is known? 3processings

srusakov commented 5 years ago

Experiencing the same issue. LiteDb as a storage. As a temporary solution I set WrokerCount to 1. @odinserj Is Pro version is free of this bug?

sheburdos commented 5 years ago

In my case some workaround was to set extended intervals for MemoryStorage

MemoryStorageOptions storageOpts = new MemoryStorageOptions()
{
    JobExpirationCheckInterval =  TimeSpan.FromMinutes(120),
    FetchNextJobTimeout = TimeSpan.FromMinutes(120)
};
GlobalConfiguration.Configuration.UseMemoryStorage(storageOpts);

But that works more or less till the task can be done in 2 hours time span. In my case I can be sure that at least most of the tasks will be accomplished

wtfuii commented 5 years ago

Hey, it seems as if #1197 is about the same issue. Everybody who runs into this issue might want to check it out.

dgioulakis commented 5 years ago

I'm prototyping with Hangfire and MemoryStorage and seeing my job being executed multiple times. Something as simple as the following:

_jobId = _jobClient.Enqueue<MyJobPerformer>(mjp => mjp.Perform(request));

public async Task Perform(RequestBase request)
{
    await Task.Delay(TimeSpan.FromSeconds(5));
    await Task.Delay(TimeSpan.FromSeconds(5));
    await Task.Delay(TimeSpan.FromSeconds(5));
}

config.UseMemoryStorage(new MemoryStorageOptions { FetchNextJobTimeout = TimeSpan.FromHours(24) }); does not seem to be a valid solution for this problem. The default TimeSpan for FetchNextJobTimeout is 30 minutes. I'm seeing multiple calls to execute my job on concurrent workers within seconds. Does anyone have a solution to this issue?

pauldotknopf commented 5 years ago

This is a very big issue. I've seen this with the SQLite and Postgres storage.

I haven't seen this with the in in-memory provider, likely because it has distributed locking implemented properly.

@odinserj, there likely needs to be clearer documentation to storage authors about how distributed locks should be implemented to prevent multiple tasks from being executed.

dgioulakis commented 5 years ago

@pauldotknopf I'm not sure what's going on. I'm using the memory storage provider and have breakpointed each await Task.Delay shown above. All breakpoints are hit multiple times by several workers. Only 1 job has been enqueued.

pauldotknopf commented 5 years ago

Hmm, that seems like an easy repro.

Considering that issue has been open since 2017, someone (not the maintainers) will likely have to debug/fix/contribute a PR.

dgioulakis commented 5 years ago

config.UseMemoryStorage(
    new MemoryStorageOptions
    {
        FetchNextJobTimeout = TimeSpan.FromSeconds(10)
    });

public class MyJobPerformer
{
    private readonly string _performerId;
    public MyJobPerformer()
    {
        _performerId = Guid.NewGuid().ToString("N");
    }

    public async Task Perform(RequestBase request)
    {
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
    }
}

Results

69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:20 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:25 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:30 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:32 PM <<< Duplicate job execution! 10 seconds lapsed.
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:35 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:37 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:42 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:47 PM

One job enqueue results in multiple worker executions. It's even worse if FetchNextJobTimeout is reduced further. I'm not sure whether this is a problem with Hangfire, MemoryStorage, or both.

rajanagayya commented 5 years ago

RecurringJob.AddOrUpdate(recurringJobId, () => EmailReceiveService.SendMail(parametes),cronExpression);

pauldotknopf commented 5 years ago

Hey @odinserj, and update on this issue?

WeslleyFelizardo commented 5 years ago

Hi @odinserj, is there any solution for this case?

MetSystem commented 5 years ago

The same question.

pauldotknopf commented 5 years ago

giphy

jmalStorm commented 5 years ago

+1. Having same issue. SQLite storage.

bchornii commented 5 years ago

+1. Having the same issue with MSSQL Server storage.

rafaelboschini commented 5 years ago

Having same issue. OMG!!!!!!

cnayan commented 4 years ago

I use ASP.Net Core, 3.1

OMG! I am having this same issue. I observe:

HF creates 2 servers for me
One server has my options, other has default
I saw my job running twice!

So, putting 2 and 2 together, I felt that the reason for execution of job twice was because there are 2 servers. That means, I need to get rid of the second server with default options.

I downloaded the HF code and placed breakpoints to analyze the flow. Here are my findings:

If you want HF to honor your BackgroundJobServerOptions, create a singleton instance of the object and don't pass it to AddHF or UseHF. Example:

services.AddSingleton(new BackgroundJobServerOptions
            {
                WorkerCount = 1,
                ServerName = "TaskSvcHangfireServer"
            });

After this, I still had 2 servers, but both using my BackgroundJobServerOptions. Halfway there!

After debugging few times, I found this:

I had services.AddHangfireServer(); in ConfigureServices
I had app.UseHangfireServer(); in Configure

The servers are created by both (CreateBackgroundJobServerHostedService > BackgroundProcessingServer > BackgroundDispatcher > BackgroundServerProcess.Execute > CreateServer), unlike other libs, where AddHangfireServer should configure, and UseHangfireServer should create instances using configs (I think.)

Then I checked the documentation, and they did not specify to use both. I removed services.AddHangfireServer() and now I have only 1 server.

So, I tested the execution of a job. This time, it executed only once.

Lessons learnt:

Inject BackgroundJobServerOptions in services. Prefer this if you want only one option to prevail.
Don't use AddHangfireServer and UseHangfireServer together, as it seems they do similar job - create server instances. (IMHO, this should be a considered bug.)

cnayan commented 4 years ago

Sigh! I'm seeing multiple "processing" jobs still, with same job ID.

No idea why.

BTW: My service is hosted in IIS.

pauldotknopf commented 4 years ago

If you want this issue fixed, you will have to do it yourself.

cnayan commented 4 years ago

Well, I have placed some checks now in my JobWorker to avoid the multiple job (with same ID) execution requests. This hack helps.

pieceofsummer commented 4 years ago

@cnayan,

Don't use AddHangfireServer and UseHangfireServer together, as it seems they do similar job - create server instances.

Indeed. The former uses IHostedService-based implementation (available only in netstandard2.0 and later) while the latter uses the more generic approach. Technically, you can have as many servers running as you want, so it shouldn't be considered a bug. But it is definitely something worth noting.

As of processing the job multiple times, it clearly is your IIS misconfiguration. From the screenshot you can see all "processing" states have different server process IDs associated with them, so it appears the application is stopped and restarted periodically. IIS can do this if the site is not configured as "always running".

Aside from that, jobs are supposed to be reentrant, so if some code is supposed to be executed once, it is up to you to track that. Or maybe introduce checkpoints by splitting your job into multiple jobs executed in sequence. See IBackgroundJobClient.ContinueJobWith() extension method.

Also consider using cancellation tokens, so the job can be terminated gracefully when the server is stopped.

cnayan commented 4 years ago

@pieceofsummer Thank you for guidance.

My problem with the Hangfire documentation is that there are no clear sections that I can focus on OWIN/ASP.Net Core. You may disagree, but it is how I see and read it. Probably I am spolied by MSDN docs.

So, your precise advice is of great help to me.

As of processing the job multiple times, it clearly is your IIS misconfiguration. From the screenshot you can see all "processing" states have different server process IDs associated with them, so it appears the application is stopped and restarted periodically. IIS can do this if the site is not configured as "always running".

After reading Making ASP.NET Core application always running on IIS, and understanding the screenshots, I've configured the IIS. But no code changes have been done to ASP.Net Core app.

Aside from that, jobs are supposed to be reentrant, so if some code is supposed to be executed once, it is up to you to track that. Or maybe introduce checkpoints by splitting your job into multiple jobs executed in sequence. See IBackgroundJobClient.ContinueJobWith() extension method.

I've a simple function to execute, thus it cannot be split. This point, probably, is not for me.

Also consider using cancellation tokens, so the job can be terminated gracefully when the server is stopped.

Good point. But, I execute a console app via job, and I am happy that it does not get killed. But your point makes sense - to abort job when requested.

Thanks again!

ZheMann commented 4 years ago

I started implementing HangFire with SQLite Storage a couple of days ago and ran into the same problem: enqueued jobs were executed as many times as I had workers initialized (20 by default). I found the solution by changing SQLite storage with HangFire.LiteDB storage. In the release notes they specifically mention 'Fix Hangfire Job starts multiple times' so I thought I'd give it a try. It turns out that they indeed solved the problem; my jobs are finally getting executed only once through which I don't need hacky workarounds anymore.

So, unless you really need SQLite as storage, I'd suggest switching to HangFire.LiteDb.

Example code:

var hangFireDb = @"D:\hangfire.db";
GlobalConfiguration.Configuration.UseLiteDbStorage(hangFireDb);
GlobalJobFilters.Filters.Add(new AutomaticRetryAttribute { Attempts = 3 });
app.UseHangfireDashboard();
app.UseHangfireServer();

Cheers!

aria321 commented 4 years ago

Same problem for me, I think this is depend on server reset, when I recycle IIS app pool manually, there are created same jobs when app pool get started again,

every time IIS app pool get restarted then duplicated jobs increased more and more. I accidently resolved this just by changing the signature of the method, My method was like : public virtual void Do(string title, T order, PerformContext context) I was looking to handle deleted job in Dashboard and need to stop job (deleted jobs just moved to Deleted they even works after deleting), so based on this link https://discuss.hangfire.io/t/deleting-job-with-onstateelection-cancellation-token/2602/5 I changed the signature of the method just like: public virtual void Do(string title, T order, PerformContext context, IJobCancellationToken cancellationToken) and call ThrowIfCancellationRequested to cancel deleted job in method block like cancellationToken?.ThrowIfCancellationRequested(); if user delete job manually in Dashboard then cancellationToken raised the exception and job will really canceled. But interested things is that by debugging and set pointer in method, Do called multi time when I hosted app and going to debug the method like restarting server, but all jobs get canceled(I don't know why, maybe HF set cancellation token) except one in the other words the cancellationToken raised for all except one. Now I see there are no any multiple jobs by restarting IIS app pool just because of ThrowIfCancellationRequested().

odinserj commented 4 years ago

What storage are you using? If it's a community-based storage, then it's possible that FetchNextJob method wasn't implemented in an atomic way, and it is possible for multiple workers to pick up the same job. Please check the repository of the concrete storage implementation (you can find it there) and report the issue there.

aria321 commented 4 years ago

I am using Redis Storage like :

 GlobalConfiguration.Configuration.UseRedisStorage("localhost",
          new Hangfire.Pro.Redis.RedisStorageOptions()
           {
                 InvisibilityTimeout = TimeSpan.MaxValue,
                 Database = 1,
                 Prefix = "hangfire:reclaim:",
           }).UseConsole();
WebApp.Start<MARCO.Reclaim.Core.Startup>(address);

and Startup Configuration method:

appBuilder.UseWebApi(config); 
appBuilder.UseHangfireDashboard("", new DashboardOptions()); 
appBuilder.UseHangfireServer(new BackgroundJobServerOptions
{
    ServerName = $"sendbulk",
    WorkerCount = 100,
    Queues = new[] { "sendbulk" }
});

the signature of method and it's override is like: public virtual void Do(string title, T order, PerformContext context, IJobCancellationToken cancellationToken)

override in derived class:

[DisplayName("{0}")]
[Queue("sendbulk")]
[AutomaticRetry(Attempts = 5, DelaysInSeconds = new int[] { 60, 60 * 3, 60 * 3 * 3, 60 * 3 * 3 * 3, 60 * 3 * 3 * 3 * 3 })]
public override void Do(string title, SendBulkOrder order, PerformContext context, IJobCancellationToken cancellationToken)

and Startup will call in WCF single instance service under IIS app pool.

odinserj commented 4 years ago

Ah I didn’t realize that those jobs are totally different because have different identifiers. So there’s something that creates them and this something is triggered once application is restarted, causing duplicates.

Пн, 5 окт. 2020 г. в 17:45, Ali.H notifications@github.com:

I am using Redis Storage like :

GlobalConfiguration.Configuration.UseRedisStorage("localhost",
      new Hangfire.Pro.Redis.RedisStorageOptions()

       {

             InvisibilityTimeout = TimeSpan.MaxValue,

             Database = 1,

             Prefix = "hangfire:reclaim:",

       }).UseConsole();
WebApp.Start(address);

and Startup Configuration method:

appBuilder.UseWebApi(config);

appBuilder.UseHangfireDashboard("", new DashboardOptions());

appBuilder.UseHangfireServer(new BackgroundJobServerOptions

{
ServerName = $"sendbulk",

WorkerCount = 100,

Queues = new[] { "sendbulk" }
});

the signature of method and it's override is like:

public virtual void Do(string title, T order, PerformContext context, IJobCancellationToken cancellationToken)

override in derived class:

[DisplayName("{0}")]

[Queue("sendbulk")]

[AutomaticRetry(Attempts = 5, DelaysInSeconds = new int[] { 60, 60 3, 60 3 3, 60 3 3 3, 60 3 3 3 3 })]

public override void Do(string title, SendBulkOrder order, PerformContext context, IJobCancellationToken cancellationToken)

and Startup will call in WCF single instance service under IIS app pool.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HangfireIO/Hangfire/issues/1025#issuecomment-703679507, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIHLPQQCDKB7UFQJO5GOI3SJHLYFANCNFSM4D7TG45A .

-- Kind regards, Sergey Odinokov, Founder/developer @ https://www.hangfire.io

odinserj commented 3 years ago

Because almost each problem ends up either with "Enqueued jobs stuck" or with "Job starts multiple times", and different problems with different storages were reported into the same issue on GitHub. I'm really sorry you have so much troubles, please try to run everything with Hangfire.SqlServer, Hangfire.Pro.Redis or Hangfire.InMemory – these storages are supported in this repository, and other storages are supported by community in their own repositories.

giovinazzo-kevin commented 3 years ago

Sorry for my shitty reply. I was having a bad day at work, but I realize that I shouldn't bring my personal problems into these spaces.

I'll just delete the old comment, it wasn't appropriate of me. Sorry again.

navidyazdi commented 3 years ago

no solution yet?

dchrno commented 3 years ago

I'm experiencing the same issue. Hangfire 1.7.9 with Hangfire.InMemory. A recurring task configured with Cron.Daily(0, 30). We specify our local timezone when invoking RecurringJob.AddOrUpdate.

This job is a long-running task which runs until 5 am. It was started as expected at 00:30, and then started a second time at 01:00 the same night while the first instance was running.

I have checked that the process didn't restart between 00:30 and 01:00.

Update 16.11.2021: Here's an example where the same job was triggered several times with 30 minute intervals. The instance triggered at 23:30 completed at 02:20, so it appears Hangfire keeps starting the job when it's already running but not completed.

2021-11-14 23:30:06 2021-11-15 00:00:06 2021-11-15 00:30:06 2021-11-15 01:00:06 2021-11-15 01:30:06 2021-11-15 02:00:06

devenpatel30 commented 2 years ago

This can happen if you have multiple servers (apps) using the same storage. Configure Hangfire to use different database schema for each application.

dchrno commented 2 years ago

We have reproduced this problem on a single server configured to use the default memory storage.

devenpatel30 commented 2 years ago

Do you flush storage each time you deploy app change?

HangfireIO / Hangfire

Hangfire Job starts multiple times--Same issue is experienced as #590 #1025

Results