ajvb / kala

Modern Job Scheduler
MIT License
2.12k stars 187 forks source link

Race Condition: Failed One-Off Job with Epsilon Panics #200

Closed tooolbox closed 4 years ago

tooolbox commented 4 years ago

When initing a job, if it has no schedule, it's run right away.

    // TODO: Delete from cache after running.
    if j.Schedule == "" {
        // If schedule is empty, its a one-off job.
        go j.Run(cache)
        return nil
    }

    j.lock.Unlock()
    err = j.InitDelayDuration(true)
    j.lock.Lock()
    if err != nil {
        j.lock.Unlock()
        cache.Delete(j.Id)
        j.lock.Lock()
        return err
    }

You can see that the run starts before InitDelayDuration() is called.

If the job has multiple tries, it could also have an Epsilon to space out those tries. However, the Epsilon is parsed in InitDelayDuration().

If the one-off job immediately fails in that separate goroutine, the Runner will call shouldRetry() which attempts to access the nil Epsilon, and panics.

tooolbox commented 4 years ago

Since, in actual fact, an Epsilon is really only applicable to jobs with a Schedule, probably the best solution would be to tweak shouldRetry() so it reads like this:

func (j *JobRunner) shouldRetry() bool {
    // Check number of retries left
    if j.currentRetries == 0 {
        return false
    }

    // Check Epsilon
    if j.job.Epsilon != "" && j.job.Schedule != "" {
        if j.job.epsilonDuration.ToDuration() != 0 {
            timeSinceStart := time.Now().Sub(j.job.NextRunAt)
            timeLeftToRetry := j.job.epsilonDuration.ToDuration() - timeSinceStart
            if timeLeftToRetry < 0 {
                return false
            }
        }
    }

    return true
}

i.e. add && j.job.Schedule != ""