ajvb / kala

Modern Job Scheduler
MIT License
2.12k stars 187 forks source link

Panic in Postgres, Mongo Storage #199

Open tooolbox opened 4 years ago

tooolbox commented 4 years ago

The following test will produce a panic in the Postgres and Mongo storage methods:

func TestPersistEpsilon(t *testing.T) {

        db := NewDatabase(dsn) // placeholder for connecting to the applicable database
    defer db.Close()
    cache := job.NewMemoryJobCache(db)

    mockJob := job.GetMockRecurringJobWithSchedule(time.Now().Add(time.Second*1), "PT1H")
    mockJob.Epsilon = "PT1H"
    mockJob.Command = "asdf"
    mockJob.Retries = 2

    err := mockJob.Init(cache)
    if assert.NoError(t, err) {

        err := db.Save(mockJob)
        if assert.NoError(t, err) {

            retrieved, err := db.GetAll()
            if assert.NoError(t, err) {
                retrieved[0].Run(cache)
            }
        }
    }
}

The panic will look like:

goroutine 33 [running]:
github.com/ajvb/kala/utils/iso8601.(*Duration).ToDuration(0x0, 0x365d6390000dfcb0)
    /Users/me/git/kala/utils/iso8601/iso8601.go:129 +0x26
github.com/ajvb/kala/job.(*JobRunner).shouldRetry(0xc0000dff38, 0x6aaa84c04)
    /Users/me/git/kala/job/runner.go:180 +0x5d
github.com/ajvb/kala/job.(*JobRunner).Run(0xc0000dff38, 0x1598120, 0xc0000a2fc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /Users/me/git/kala/job/runner.go:68 +0x425
github.com/ajvb/kala/job.(*Job).Run(0xc00024e000, 0x1598120, 0xc0000a2fc0)
    /Users/me/git/kala/job/job.go:434 +0x10e
created by github.com/ajvb/kala/job.(*Job).Init
    /Users/me/git/kala/job/job.go:195 +0x535
FAIL    github.com/ajvb/kala/job/storage/mysql  28.677s

The real-world scenario where this occurs is: creating a job with a longish Epsilon, running once and failing, then rebooting the service. Panic will occur on startup when the cache is re-loading the jobs from the db and starts firing them off.

The root cause is not calling InitDelayDuration() on jobs when retrieving them from the database. If this isn't done, any job with an Epsilon will cause this panic.

From reading the source, these storage drivers call InitDelayDuration() correctly:

These do not: