Closed YukaAn closed 2 years ago
some instances of this issue:
@YukaAn, what storage are you using? Could you include the full details, e.g. the data
column?
@odinserj I'm using sql server 2008, Hangfire version is 1.6.6. Below is a screenshot of a job state info:
Could you show me your configuration logic and recurring method's signature?
@odinserj Thank you for the quick response!
The configuration logic looks like this:
public void Configuration(IAppBuilder app) {
GlobalConfiguration.Configuration.UseSqlServerStorage("HangfireDb");
app.UseHangfireDashboard();
app.UseHangfireServer();
}
the recurring methods like this:
public void CreateRecurringJob(int hour, int minute, int Id, string Name, string occurence)
{
try
{
if(!MinuteCheck(minute) || !HourCheck(hour) || !CronCheck(occurence))
{
return;
}
string cron = BuildCron(hour, minute, occurence);
if(IsExistingOrNewMethod(Id, Name))
{
ScheduledJobHandler handler = new ScheduledJobHandler();
RecurringJob.AddOrUpdate(
JobNameBuilder(Id, Name),
() => handler.SendRequest(Id, Name),
cron,
TimeZoneInfo.FindSystemTimeZoneById("Eastern Standard Time")
);
}
}
catch(Exception ex)
{
throw ApiException(ex);
}
}
@odinserj I have around 70 recurring jobs scheduled each day and this issue keeps happening couple times every day (randomly on different jobs). I'm still waiting for your reply and I appreciate your help. Thanks!
@YukaAn, sorry for the delay. Try to upgrade to the latest version. At least Hangfire.Core 1.6.12 has a fix related to a problem like yours:
• Fixed – Buggy state filters may cause background job to be infinitely retried.
Looks like there's a transient exception that occur when your job is completed, and only logging could help to investigate the issue in detail. Please see this article to learn how to enable it, and feel free to post your log messages into this thread to conduct a further investigation.
I am having this same issue on Hangfire 1.6.20 and using LiteDBStorage. I have seen several other reports of this issue but no resolution. Are you still using a work around?
Same issue here. Hangfire 1.6.20 and Hangfire.SQLite 1.4.2
public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
...
RecurringJob.AddOrUpdate("debug", () => Hangfire(), Cron.Minutely);
}
public void Hangfire()
{
Debug.WriteLine($"{DateTime.Now} - Hangfire");
}
16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:14 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
16/08/2018 10:58:15 AM - Hangfire
Hello. I have the same issue.
The job starts for every worker. If I set the server worker count to 2 then it starts 2 times, if I set it to 50 then it starts 50 times.
I use the latest Hangfire version (1.6.20) and SQLite for storage.
The job is enqueued from the web application.
BackgroundJob.Enqueue(() => StartDatabaseExport(databaseId));
The server is started from another application (windows service).
var options = new BackgroundJobServerOptions { WorkerCount = 50 };
new BackgroundJobServer(options);
Any ideas?
I also tried with the LiteDB storage, same problem. I then tried with the in-memory storage and it works as expected. So it seems it's related to the storage.
Hangfire 1.6.17.0 MemoryStorage: 1.5.1.0
Tasks are created as:
IState state = new EnqueuedState(QueueName.PRIORITY); _jobClient.Create(() => TaskFactory.Build(id), state);
Sometimes jobs run multiple times. Log from within the task:
2018-09-12 11:45:43.7292|INFO|44fa0359-e13c-4356-bdb9-9690df16eda0|Export calculation unit 2018-09-12 12:16:14.7399|INFO|44fa0359-e13c-4356-bdb9-9690df16eda0|Export calculation unit ... 2018-09-12 12:27:51.2235|INFO|44fa0359-e13c-4356-bdb9-9690df16eda0|Done! ... 2018-09-12 12:51:39.4573|INFO|44fa0359-e13c-4356-bdb9-9690df16eda0|Done!
Job do access to database. So it could stuck for some time if all db connections from the pool are taken by other jobs. That what happened at the beginning I assume. Then the task was run again, but there were no reports on retry\error. And actually after that, task reported twice about successful completion (as well as about intermediate steps). Feels like the job was retried after some waiting period without canceling previous evaluation.
Although your messages should be idempotent, this should definitely be fixed in Hangfire.
Is this issue in Hangfire itself, or in the storage providers?
Hi, We have the same problem. We tried to find a solution and after a long time we probably managed somethink. The problem is associated with storage beyond any doubt. Mulitiple wrokers run jobs when you use LiteDB, SQLite and similar storages. Everythink is ok with SQL Server. So if it was possible to fix the error in SQL Server Storage it could be possible in others. So this is my shy request to the creators
I think I found the issue in the Sqlite provider: https://github.com/mobydi/Hangfire.Sqlite/issues/2#issuecomment-441019511
Maybe this issue is something similar for the SQL Server provider as well?
Experiencing the same issue. We use SQL Server as a storage (if it matters). Any updated when it could be fixed, or at least if root cause is known?
Experiencing the same issue. LiteDb as a storage. As a temporary solution I set WrokerCount to 1. @odinserj Is Pro version is free of this bug?
In my case some workaround was to set extended intervals for MemoryStorage
MemoryStorageOptions storageOpts = new MemoryStorageOptions()
{
JobExpirationCheckInterval = TimeSpan.FromMinutes(120),
FetchNextJobTimeout = TimeSpan.FromMinutes(120)
};
GlobalConfiguration.Configuration.UseMemoryStorage(storageOpts);
But that works more or less till the task can be done in 2 hours time span. In my case I can be sure that at least most of the tasks will be accomplished
Hey, it seems as if #1197 is about the same issue. Everybody who runs into this issue might want to check it out.
I'm prototyping with Hangfire and MemoryStorage and seeing my job being executed multiple times. Something as simple as the following:
_jobId = _jobClient.Enqueue<MyJobPerformer>(mjp => mjp.Perform(request));
public async Task Perform(RequestBase request)
{
await Task.Delay(TimeSpan.FromSeconds(5));
await Task.Delay(TimeSpan.FromSeconds(5));
await Task.Delay(TimeSpan.FromSeconds(5));
}
config.UseMemoryStorage(new MemoryStorageOptions { FetchNextJobTimeout = TimeSpan.FromHours(24) });
does not seem to be a valid solution for this problem. The default TimeSpan for FetchNextJobTimeout is 30 minutes. I'm seeing multiple calls to execute my job on concurrent workers within seconds. Does anyone have a solution to this issue?
This is a very big issue. I've seen this with the SQLite and Postgres storage.
I haven't seen this with the in in-memory provider, likely because it has distributed locking implemented properly.
@odinserj, there likely needs to be clearer documentation to storage authors about how distributed locks should be implemented to prevent multiple tasks from being executed.
@pauldotknopf
I'm not sure what's going on. I'm using the memory storage provider and have breakpointed each await Task.Delay
shown above. All breakpoints are hit multiple times by several workers. Only 1 job has been enqueued.
Hmm, that seems like an easy repro.
Considering that issue has been open since 2017, someone (not the maintainers) will likely have to debug/fix/contribute a PR.
config.UseMemoryStorage(
new MemoryStorageOptions
{
FetchNextJobTimeout = TimeSpan.FromSeconds(10)
});
public class MyJobPerformer
{
private readonly string _performerId;
public MyJobPerformer()
{
_performerId = Guid.NewGuid().ToString("N");
}
public async Task Perform(RequestBase request)
{
Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
await Task.Delay(TimeSpan.FromSeconds(5));
Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
await Task.Delay(TimeSpan.FromSeconds(5));
Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
await Task.Delay(TimeSpan.FromSeconds(5));
Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
}
}
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:20 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:25 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:30 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:32 PM <<< Duplicate job execution! 10 seconds lapsed.
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:35 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:37 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:42 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:47 PM
One job enqueue results in multiple worker executions. It's even worse if FetchNextJobTimeout
is reduced further. I'm not sure whether this is a problem with Hangfire, MemoryStorage, or both.
RecurringJob.AddOrUpdate(recurringJobId, () => EmailReceiveService.SendMail(parametes),cronExpression);
Hey @odinserj, and update on this issue?
Hi @odinserj, is there any solution for this case?
The same question.
+1. Having same issue. SQLite storage.
+1. Having the same issue with MSSQL Server storage.
Having same issue. OMG!!!!!!
I use ASP.Net Core, 3.1
OMG! I am having this same issue. I observe:
So, putting 2 and 2 together, I felt that the reason for execution of job twice was because there are 2 servers. That means, I need to get rid of the second server with default options.
I downloaded the HF code and placed breakpoints to analyze the flow. Here are my findings:
BackgroundJobServerOptions
, create a singleton instance of the object and don't pass it to AddHF or UseHF. Example:services.AddSingleton(new BackgroundJobServerOptions
{
WorkerCount = 1,
ServerName = "TaskSvcHangfireServer"
});
After this, I still had 2 servers, but both using my BackgroundJobServerOptions
. Halfway there!
After debugging few times, I found this:
services.AddHangfireServer();
in ConfigureServicesapp.UseHangfireServer();
in ConfigureThe servers are created by both (CreateBackgroundJobServerHostedService > BackgroundProcessingServer > BackgroundDispatcher > BackgroundServerProcess.Execute > CreateServer
), unlike other libs, where AddHangfireServer
should configure, and UseHangfireServer
should create instances using configs (I think.)
Then I checked the documentation, and they did not specify to use both. I removed services.AddHangfireServer()
and now I have only 1 server.
So, I tested the execution of a job. This time, it executed only once.
Lessons learnt:
AddHangfireServer
and UseHangfireServer
together, as it seems they do similar job - create server instances. (IMHO, this should be a considered bug.)Sigh! I'm seeing multiple "processing" jobs still, with same job ID.
No idea why.
BTW: My service is hosted in IIS.
If you want this issue fixed, you will have to do it yourself.
Well, I have placed some checks now in my JobWorker to avoid the multiple job (with same ID) execution requests. This hack helps.
@cnayan,
Don't use AddHangfireServer and UseHangfireServer together, as it seems they do similar job - create server instances.
Indeed. The former uses IHostedService
-based implementation (available only in netstandard2.0 and later) while the latter uses the more generic approach. Technically, you can have as many servers running as you want, so it shouldn't be considered a bug. But it is definitely something worth noting.
As of processing the job multiple times, it clearly is your IIS misconfiguration. From the screenshot you can see all "processing" states have different server process IDs associated with them, so it appears the application is stopped and restarted periodically. IIS can do this if the site is not configured as "always running".
Aside from that, jobs are supposed to be reentrant, so if some code is supposed to be executed once, it is up to you to track that. Or maybe introduce checkpoints by splitting your job into multiple jobs executed in sequence. See IBackgroundJobClient.ContinueJobWith()
extension method.
Also consider using cancellation tokens, so the job can be terminated gracefully when the server is stopped.
@pieceofsummer Thank you for guidance.
My problem with the Hangfire documentation is that there are no clear sections that I can focus on OWIN/ASP.Net Core. You may disagree, but it is how I see and read it. Probably I am spolied by MSDN docs.
So, your precise advice is of great help to me.
As of processing the job multiple times, it clearly is your IIS misconfiguration. From the screenshot you can see all "processing" states have different server process IDs associated with them, so it appears the application is stopped and restarted periodically. IIS can do this if the site is not configured as "always running".
After reading Making ASP.NET Core application always running on IIS, and understanding the screenshots, I've configured the IIS. But no code changes have been done to ASP.Net Core app.
Aside from that, jobs are supposed to be reentrant, so if some code is supposed to be executed once, it is up to you to track that. Or maybe introduce checkpoints by splitting your job into multiple jobs executed in sequence. See
IBackgroundJobClient.ContinueJobWith()
extension method.
I've a simple function to execute, thus it cannot be split. This point, probably, is not for me.
Also consider using cancellation tokens, so the job can be terminated gracefully when the server is stopped.
Good point. But, I execute a console app via job, and I am happy that it does not get killed. But your point makes sense - to abort job when requested.
Thanks again!
I started implementing HangFire with SQLite Storage a couple of days ago and ran into the same problem: enqueued jobs were executed as many times as I had workers initialized (20 by default). I found the solution by changing SQLite storage with HangFire.LiteDB storage. In the release notes they specifically mention 'Fix Hangfire Job starts multiple times' so I thought I'd give it a try. It turns out that they indeed solved the problem; my jobs are finally getting executed only once through which I don't need hacky workarounds anymore.
So, unless you really need SQLite as storage, I'd suggest switching to HangFire.LiteDb.
Example code:
var hangFireDb = @"D:\hangfire.db";
GlobalConfiguration.Configuration.UseLiteDbStorage(hangFireDb);
GlobalJobFilters.Filters.Add(new AutomaticRetryAttribute { Attempts = 3 });
app.UseHangfireDashboard();
app.UseHangfireServer();
Cheers!
Same problem for me, I think this is depend on server reset, when I recycle IIS app pool manually, there are created same jobs when app pool get started again,
every time IIS app pool get restarted then duplicated jobs increased more and more.
I accidently resolved this just by changing the signature of the method,
My method was like :
public virtual void Do(string title, T order, PerformContext context)
I was looking to handle deleted job in Dashboard and need to stop job (deleted jobs just moved to Deleted they even works after deleting), so based on this link https://discuss.hangfire.io/t/deleting-job-with-onstateelection-cancellation-token/2602/5 I changed the signature of the method just like:
public virtual void Do(string title, T order, PerformContext context, IJobCancellationToken cancellationToken)
and call ThrowIfCancellationRequested
to cancel deleted job in method block like
cancellationToken?.ThrowIfCancellationRequested();
if user delete job manually in Dashboard then cancellationToken
raised the exception and job will really canceled. But interested things is that by debugging and set pointer in method, Do
called multi time when I hosted app and going to debug the method like restarting server, but all jobs get canceled(I don't know why, maybe HF set cancellation token) except one in the other words the cancellationToken
raised for all except one.
Now I see there are no any multiple jobs by restarting IIS app pool just because of ThrowIfCancellationRequested()
.
What storage are you using? If it's a community-based storage, then it's possible that FetchNextJob method wasn't implemented in an atomic way, and it is possible for multiple workers to pick up the same job. Please check the repository of the concrete storage implementation (you can find it there) and report the issue there.
I am using Redis Storage like :
GlobalConfiguration.Configuration.UseRedisStorage("localhost",
new Hangfire.Pro.Redis.RedisStorageOptions()
{
InvisibilityTimeout = TimeSpan.MaxValue,
Database = 1,
Prefix = "hangfire:reclaim:",
}).UseConsole();
WebApp.Start<MARCO.Reclaim.Core.Startup>(address);
and Startup Configuration method:
appBuilder.UseWebApi(config);
appBuilder.UseHangfireDashboard("", new DashboardOptions());
appBuilder.UseHangfireServer(new BackgroundJobServerOptions
{
ServerName = $"sendbulk",
WorkerCount = 100,
Queues = new[] { "sendbulk" }
});
the signature of method and it's override is like:
public virtual void Do(string title, T order, PerformContext context, IJobCancellationToken cancellationToken)
override in derived class:
[DisplayName("{0}")]
[Queue("sendbulk")]
[AutomaticRetry(Attempts = 5, DelaysInSeconds = new int[] { 60, 60 * 3, 60 * 3 * 3, 60 * 3 * 3 * 3, 60 * 3 * 3 * 3 * 3 })]
public override void Do(string title, SendBulkOrder order, PerformContext context, IJobCancellationToken cancellationToken)
and Startup will call in WCF single instance service under IIS app pool.
Ah I didn’t realize that those jobs are totally different because have different identifiers. So there’s something that creates them and this something is triggered once application is restarted, causing duplicates.
Пн, 5 окт. 2020 г. в 17:45, Ali.H notifications@github.com:
I am using Redis Storage like :
GlobalConfiguration.Configuration.UseRedisStorage("localhost",
new Hangfire.Pro.Redis.RedisStorageOptions() { InvisibilityTimeout = TimeSpan.MaxValue, Database = 1, Prefix = "hangfire:reclaim:", }).UseConsole();
WebApp.Start
(address); and Startup Configuration method:
appBuilder.UseWebApi(config);
appBuilder.UseHangfireDashboard("", new DashboardOptions());
appBuilder.UseHangfireServer(new BackgroundJobServerOptions
{
ServerName = $"sendbulk", WorkerCount = 100, Queues = new[] { "sendbulk" }
});
the signature of method and it's override is like:
public virtual void Do(string title, T order, PerformContext context, IJobCancellationToken cancellationToken)
override in derived class:
[DisplayName("{0}")]
[Queue("sendbulk")]
[AutomaticRetry(Attempts = 5, DelaysInSeconds = new int[] { 60, 60 3, 60 3 3, 60 3 3 3, 60 3 3 3 3 })]
public override void Do(string title, SendBulkOrder order, PerformContext context, IJobCancellationToken cancellationToken)
and Startup will call in WCF single instance service under IIS app pool.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HangfireIO/Hangfire/issues/1025#issuecomment-703679507, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIHLPQQCDKB7UFQJO5GOI3SJHLYFANCNFSM4D7TG45A .
-- Kind regards, Sergey Odinokov, Founder/developer @ https://www.hangfire.io
Because almost each problem ends up either with "Enqueued jobs stuck" or with "Job starts multiple times", and different problems with different storages were reported into the same issue on GitHub. I'm really sorry you have so much troubles, please try to run everything with Hangfire.SqlServer, Hangfire.Pro.Redis or Hangfire.InMemory – these storages are supported in this repository, and other storages are supported by community in their own repositories.
Sorry for my shitty reply. I was having a bad day at work, but I realize that I shouldn't bring my personal problems into these spaces.
I'll just delete the old comment, it wasn't appropriate of me. Sorry again.
no solution yet?
I'm experiencing the same issue. Hangfire 1.7.9 with Hangfire.InMemory. A recurring task configured with Cron.Daily(0, 30). We specify our local timezone when invoking RecurringJob.AddOrUpdate.
This job is a long-running task which runs until 5 am. It was started as expected at 00:30, and then started a second time at 01:00 the same night while the first instance was running.
I have checked that the process didn't restart between 00:30 and 01:00.
Update 16.11.2021: Here's an example where the same job was triggered several times with 30 minute intervals. The instance triggered at 23:30 completed at 02:20, so it appears Hangfire keeps starting the job when it's already running but not completed.
2021-11-14 23:30:06 2021-11-15 00:00:06 2021-11-15 00:30:06 2021-11-15 01:00:06 2021-11-15 01:30:06 2021-11-15 02:00:06
This can happen if you have multiple servers (apps) using the same storage. Configure Hangfire to use different database schema for each application.
We have reproduced this problem on a single server configured to use the default memory storage.
Do you flush storage each time you deploy app change?
I have gone through all the open issues here and found that the issue Im experiencing supposed to be solved with v1.5.8. But Im running v1.6.6 and still seeing the similar issue. So the same job will be processed multiple times randomly. I also saw issue #842 describing the same thing. Can someone help me to fix it?
I'm using Hangfire.SqlServer V1.6.6