SGrondin / bottleneck

Job scheduler and rate limiter, supports Clustering
MIT License
1.83k stars 76 forks source link

Cancel or replace tasks by `id` #90

Open garrettmaring opened 5 years ago

garrettmaring commented 5 years ago

Is there an ability to replace a task or to cancel one? The use case here is receiving multiple updates before previous ones were able to execute. In some cases, the previous updates will be stale so it can be more performant to remove them and run the newer ones.

Here's better-queue's API for inspiration.

SGrondin commented 5 years ago

I like this idea a lot! I'll look into it.

I think the best way to do it is with a limiter option called something like duplicateJobIds and it would default to "refuse", which is the current behavior. This new behavior ("cancel") would cancel the old one and queue up the new one instead.

garrettmaring commented 5 years ago

Awesome, I like the idea of a duplication strategy option. Maybe "refuse", "cancelAndQueue", "replace" (not suggesting naming, just concepts).

I can imagine use cases where one would want to cancel the previously scheduled job and queue the new one.

In my case, however, I'd like to maintain the queue order yet replace the job that will be run when that job is dequeued.

SGrondin commented 5 years ago

Update: I'm still thinking about this, but at the moment it is lower priority than the other features being requested. It's not a straightforward feature to implement and it will need a lot of thought.

In the mean time, something like this might be enough for most use cases:

const jobs = {};

const addJob = function (id, workToDo) {
  jobs[id] = workToDo;
  limiter.schedule({ id }, async () => {
    const result = await jobs[id](arg1, arg2);
    delete jobs[id];
    return result;
  });
};

const replaceJob = function (id, workToDo) {
  if (jobs[id] != null) {
    jobs[id] = workToDo;
  }
};

// Cancelling a job might be synonymous with replacing its workToDo with an empty function?
garrettmaring commented 5 years ago

Sounds good. Understand the prioritization. Keep me in the loop when you're thinking of circling back on this. I'll also scan through the code and see if some thoughts on implementation come to me.

SGrondin commented 5 years ago

Thanks to the latest refactor of the core engine, this feature is now feasible without making a mess.

@garrettmaring While implementing it, I ran into a design question I didn't anticipate and I would love to hear your thoughts on this!

If we go with the approach outlined here it will work great for jobs in the RECEIVED, QUEUED and RUNNING states. However, jobs in the EXECUTING or DONE states cannot be canceled, replaced, etc, anymore. It means that queueing a job can still be refused even when using the replace "job duplication strategy" if existing job is EXECUTING.

Here's another example of the design problem I'm trying to solve:

So what if instead of a global setting, we put the control in the user's hands?

Is this an acceptable solution? It's less "magic". Users have to write more code, but I think it's also easier to debug. It makes the user think twice before enabling trackDoneStatus: true.

garrettmaring commented 5 years ago

Exciting to hear this is able to move forward now 👍I understand the design problem you've explained.

I think the biggest hang-up I have here is that an ID can never be used twice. Is this true for the new design? If I have a job with an id of "update-foo", can we not schedule that job more than once? I feel like I'm missing something here!

If so, what would happen if limiter.replace(id, newJob) is called on a job with DONE status?

SGrondin commented 5 years ago

I think the biggest hang-up I have here is that an ID can never be used twice.

This is not 100% true. Let's say we choose to go with the "duplicate job strategy" approach and we've set the strategy to "replace". Here's what would happen:

Scenario 1:

  1. my-id is RECEIVED, QUEUED, or RUNNING.
  2. You schedule another job with id my-id.
  3. The existing job is replaced successfully

Scenario 2:

  1. my-id is EXECUTING.
  2. You schedule another job with id my-id.
  3. It's impossible to update a job that's already been started, we must throw an exception. a. Bottleneck can't go ahead and schedule it anyway, because if the user called limiter.jobStatus('my-id') how does Bottleneck know which one you're talking about? It can't know whether to return 'EXECUTING' (old job) or 'RECEIVED' (new job)?

If I have a job with an id of "update-foo", can we not schedule that job more than once?

No because Bottleneck needs to be able to give one answer if the user runs limiter.jobStatus('update-foo'). Think of it as an Object that maps 1 id to 1 status.

If so, what would happen if limiter.replace(id, newJob) is called on a job with DONE status?

If trackDoneStatus is false (default), then the DONE status doesn't exist and the problem only exists while it's EXECUTING. If trackDoneStatus is true, then you'll never be able to reuse the ID once it's DONE, for the same reason.

Does that make sense?

That's why I'm considering offering a more manual API.

If so, what would happen if limiter.replace(id, newJob) is called on a job with DONE status?

It would have to throw an exception or simply do nothing. The user would need to check to see if the job can be updated or not. It's not great, I agree. 😭

abou7mied commented 5 years ago

any updates on this?

yairopro commented 1 year ago

Maybe giving an AbortSignal as parameter to schedule() would be enough to fill the need of canceling. And for replacing, the user can still set the function with the behavior he wants anyway.

Thanks for this library @SGrondin !

RichardWright commented 1 year ago

Would really like abort signal