Open bladefist opened 4 years ago
Try reclaiming white space / re-index table in online mode (or offline mode scheduled for faster performance).
@houseofcat That kicking the can down the road. In reality we have millions of finished jobs that we don't need to carry forward for the rest of eternity.
The great purge!
All the background jobs in Succeeded and Deleted state are expired automatically, regular background jobs expire after 24 hours, batched jobs expire after 7 days, both settings are configurable. Counter on the Dashboard UI, such as Succeeded: 24,049,482, is just a counter and does not represent that all of those succeeded jobs are still in the storage. Hangfire was built to avoid such storage leaks.
The reasons for such a bloat can be heavily fragmented indexes, so you'll need to run index reorganization/rebuild with scripts like this. Another problem can be related to long running jobs that prevent transaction log from being truncated, so you will need to use sliding invisibility timeout fetching in the following way:
.UseSqlServerStorage("connection_string", new SqlServerStorageOptions
{
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5)
});
In this case there will be no immediate job re-queue in case of unexpected process shutdown (such as via process kill via task manager, debug session stop in VS), but long-running jobs will be processed more robustly.
regular background jobs expire after 24 hours
Can this 24 hours be modified through configuration?
@bxjg1987 JobStorage
has JobExpirationTimeout
property that is used as default when marking jobs for expiration. You can also override that default value in IApplyStateFilter
on per-job basis.
Is there any way how to also clean up the counter
table? It contains thousands of records, which overflows the row limits in Heroku.
I only have this config, that successfully removes the data from other tables, but doesn't work on that counter
table.
services.AddHangfire(config =>
{
config.UsePostgreSqlStorage(connectionString, new PostgreSqlStorageOptions
{
JobExpirationCheckInterval = TimeSpan.FromMinutes(15),
}).WithJobExpirationTimeout(TimeSpan.FromHours(1));
});
We have been running Hangfire for many years. I have observed the dashboard page doesn't open any more returning the 504 status code. I tried checking the number of columns in Hangfire tables:
Counter - 234 million
Job - 63 million
Jobparameter - 126 million
State - 189 million
Hash - 45
Jobqueue - 0
List - 0
Lock - 1
Schema - 1
Server - 3
Set - 5
I see the tables contain succeeded job data. Is it safe to delete it from the database directly? I can see successful jobs which are many months old. I guess they should have been deleted, but they are still there.
Please show me your configuration code related to Hangfire to understand why jobs aren’t deleted and tell me what version you are using. Also I see that counter aggregator component isn’t working so we need to configure logging to understand what’s going on https://docs.hangfire.io/en/latest/configuration/configuring-logging.html
Here is the hangfire version (using ASP.NET 5):
<PackageReference Include="Hangfire.AspNetCore" Version="1.7.27" />
<PackageReference Include="Hangfire.PostgreSql.ahydrax" Version="1.7.4" />
services.AddHangfire(config => config.UsePostgreSqlStorage($"{Configuration.Get<AppOptions>().Database.Url};Search Path=hangfire"));
var hangfireOptions = new BackgroundJobServerOptions();
app.UseHangfireServer(hangfireOptions);
Mostly the jobs are enqueued like this:
BackgroundJob.Enqueue<WebhookService>(j => j.InvokeWebhookJob(eventType, data));
The above function looks like this:
[AutomaticRetry(Attempts = 0, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
public async Task InvokeWebhookJob(string eventType, string data)
{
...
}
select count(*) from hangfire.counter where expireat is not null and expireat < now()
returns
53930
Thank you for the information. Please note that this is storage-related issue. Since you are using Hangfire.PostgreSql.ahydrax (and not Hangfire.SqlServer), please raise the issue at their repository – https://github.com/ahydrax/Hangfire.PostgreSql.
I checked the Aurora Postgres logs, I do see these queries regularly:
ok thanks
regular background jobs expire after 24 hours
Can this 24 hours be modified through configuration?
@odinserj With v1.8, how do we change this from 24 hour to 6 hours?
Hello,
Our hangfire sql database has grown to 30GB. We cannot reset our db to new due to a lot of scheduled jobs that we want to keep. Has anyone come up with scripts to purge old completed/failed jobs?
thanks.