HangfireIO / Hangfire

An easy way to perform background job processing in .NET and .NET Core applications. No Windows Service or separate process required
https://www.hangfire.io
Other
9.4k stars 1.7k forks source link

Limit retention time (shorter instead of longer) #981

Open arxae opened 7 years ago

arxae commented 7 years ago

The current project i'm working on has to process a lot of SQL tasks. This works by dumping a task to the queue every couple of milliseconds (100), when the task is done, it returns to an internal queue. So the tasks rotate 24/7 in memory.

The problem here is that this creates a lot of "waste" entries. After around 2 minutes, a couple of thousand of records are created. Standard retention is 1 day, so we are speaking of a couple million of records every day. If these records succeeded, they shouldn't be kept that long, in fact, if they would be cleaned up immediately, that would be fine too.

I tried using a custom filter:

public class TestExpireAttribute : JobFilterAttribute, IApplyStateFilter
{
    public void OnStateApplied(ApplyStateContext context, IWriteOnlyTransaction transaction)
    {
        context.JobExpirationTimeout = TimeSpan.FromSeconds(1);
    }

    public void OnStateUnapplied(ApplyStateContext context, IWriteOnlyTransaction transaction)
    {
        context.JobExpirationTimeout = TimeSpan.FromSeconds(1);
    }
}

Note the 1 second is a test.

But this seems to only remove those entries on startup. Is there a way to keep the entries only for the past couple of minutes and directly remove them after a few minutes of being completed?

pieceofsummer commented 7 years ago

Expired records are cleaned up on a regular basis (every 30 minutes by default for most storages), so it probably had no chance to run a second time yet. You can decrease this interval by adjusting JobExpirationCheckInterval property for your job storage.

You may also want to switch to Redis storage, which automatically deletes records as soon as they expire.

But I still suspect that scheduling tasks every 100ms 24/7 would give you a crazy database overhead, and you may even have no free workers to handle them all in time. It might be better to just spawn a dedicated thread which would run your tasks non-stop (with 100ms delays between them).

arxae commented 7 years ago

Yeah i had to throttle them by myself in the end, otherwise the other database could be in trouble since the queries completed faster then i suspected.

Now it dumps a bunch of tasks to the queue every 60 seconds. Going to see how far i can bring the number up. This creates a lot less output, but i still don't need a days worth of output.

Going to fiddle with it until i get the results i want. It's a bit of a prototype atm. Just wondering though, the number next to succeeded. Is that just the total overall? With purged jobs?

Also, don't worry, i will keep your advice in mind. Currently i have a list of tasks that need to be run. They are grabbed from a database, executed and processed and rotate back into the todo queue. The tasks are performed on the hangfire jobs. The todo queue get dumped to the hangfire queues every 60 seconds. To lighten to load a bit on the target server, i make sure the task took at least 5 seconds, otherwise pad it with some sleep. In general, the queries don't take that long, so each task lingers for a couple of seconds. Probably not the best solution, if there is a hangfire solution that would do something like this, that would be great).

I know this in itself is not the best use of hangfire, but there is a bunch of other things going to go on with background tasks. This todo queue is something that needs to happen anyway, and the parallelism and retrying mechanisms of hangfire makes this very simple to have as a base.

Bit of a ramble, but would provide some context i suppose. So the too long, here's my real questions list: