FirebaseExtended / firebase-queue

MIT License
786 stars 108 forks source link

Task timeouts don't work. Also, what is the expected behavior of a timed out task? #106

Open mm11715 opened 7 years ago

mm11715 commented 7 years ago

I've tested queue timeouts and noticed the following:

  1. Queue timeouts as specified by the default spec are not respected by the firebase queue. This can be replicated by setting a timeout value in the default spec and adding a handler that uses setTimeout() with a value greater than the timeout value. The task is owned and completed by the setTimeout handler even after it should have timed out.

  2. The expected behavior of timeouts if they were working isn't documented anywhere. If the task times out, is it supposed to be put into an error state or does it return to the start_state?

eyeezzi commented 7 years ago

This question is very important. It seems the queue worker just resets the task to its start_state on timeout, the result is an infinite loop where the same tasks keeps timing out and getting picked up again. I think ideally, a timed out task should be put in its error_state and only be picked up if there are unused retries.

Caerbannog commented 7 years ago

There are four aspects to the issue. The result is that newcomers can very easily be mislead.

  1. @mm11715 I think you made the same mistake as I did. I had code similar to this:

    var queueRef = firebase.database().ref('queue');
    var specs = {
     "default_spec": {
       "start_state": null,
       "in_progress_state": "in_progress",
       "finished_state": null,
       "error_state": "error",
       "timeout": xxx,
       "retries": 0
     }
    };
    queueRef.child('specs').set(specs);
    var options = {
    };
    var queue = new Queue(queueRef, options, ...);

    Notice that I didn't set specId: "default_spec" in the options of the queue, assuming that when the guide mentions "default_spec" it has a special meaning. It has not, so the result is that we define the timeout in a spec that will never be used. Suggestion: update the documentation to remove the "default_spec" name, or update the library to use such a spec if it exists.

  2. Confusing documentation: the expected behavior of timeouts is described here:

    timeout - When a task [...] has not completed [...], the queue will report that task as timed out, and reset that task to be claimable once again.

    I would mention start_state to be clearer.

  3. Confusing documentation: the next configuration being described after timeout is retries, without explicitly saying that it has no relationship with timeouts:

    retries - [...] When a task fails, if there are any remaining attempts, the queue will restart the task by setting the task's _state to its spec's start_state.

  4. If a task is just too long for its timeout, the task will restart, be processed again, and can start a loop that indefinetly blocks all other tasks from being processed! Resetting a timed-out task to the error state would benefit from the retries mechanism as @eyeezzi suggests.

@drtriumph: Would a PR fixing those points be accepted?

renanferrari commented 7 years ago

@eyeezzi @Caerbannog Totally agree with your suggestions. Resetting a timed-out task to its error state instead of its start state seems much more intuitive to me.

The only problem I see is that such a change of behavior in the API would certainly mean a reasonable impact. To minimize it, I'd suggest to add a new optional field to the spec defining the task's state when it times out. Something like "timeout_state":

  "default_spec": {
    "start_state": null,
    "in_progress_state": "in_progress",
    "finished_state": null,
    "error_state": "error",
    "timeout": 300000,
    "timeout_state": "error",
    "retries": 3
  }

If this "timeout_state" is null or equals to the spec's start_state, then the task would behave like it does today. That not only makes the adoption of the new behavior optional, it also expands the API a little bit (we could define custom states for the timed out tasks).

Let me know what you think.

pandaiolo commented 7 years ago

I am also experiencing enldess loop of tasks when they exceed default timeout of 5min. I do not want to change the timeout, 5 minutes is fine. Also, the tasks continue running in the background, which is fine. I just send an alert to manually check those tasks.

What I did to stop the faulty behaviour ATM :

I am still experiencing tasks restarting by themselves, so I might be doing it wrong. If you guys have a better idea :)