go-co-op / gocron

Easy and fluent Go cron scheduling. This is a fork from https://github.com/jasonlvhit/gocron
MIT License
5.7k stars 308 forks source link

[BUG] A job stops running with out any error #693

Closed vish-c closed 8 months ago

vish-c commented 8 months ago

Describe the bug

Dear gocron community, Firstly thank you for a nice package.

i would like to report a probably strange issue where one of the job registered to the scheduler runs the job function but randomly stops executing the function. This again starts executing after explicitly when i trigger scheduler.Start() function via an api call !

I have a custom interface called 'Job' which is implemented by the functions that would eventually be executed by the scheduler.

type Job interface {
    Run() error
    BeforeJobRunsEventListener(jobID uuid.UUID, jobName string)
    AfterJobRunsEventListener(jobID uuid.UUID, jobName string)
    AfterJobRunsWithErrorEventListener(jobID uuid.UUID, jobName string, err error)
}

// implements Job interface methods
type ConsumerJob struct {
}

// implements Job interface methods
type ProducerJob struct {
}

I have a following global variable called schedulerManager, which i use as singleton variable.

type mySchedulerManager struct {
    scheduler         gocron.Scheduler
    jobRegister       map[string]uuid.UUID
}

var schedulerManager *mySchedulerManager

The following functions are for setting up the jobs and then start the jobs

// addJob adds a given job of type Job to the scheduler
func addJob(job Job, periodInSeconds int, beforeJobRunFunc func(jobID uuid.UUID, jobName string),
    afterJobRunFunc func(jobID uuid.UUID, jobName string),
    afterJobRunsWithErrorFunc func(jobID uuid.UUID, jobName string, err error)) error {

    if schedulerManager == nil {
        return fmt.Errorf("schedulerManager is nil, cannot add a job")
    }

    jobName := reflect.TypeOf(job).String()

    if _, ok := schedulerManager.jobRegister[jobName]; !ok {
        log.Debugf("configuring job %v to run every %v seconds", jobName, periodInSeconds)

        j, err := schedulerManager.scheduler.NewJob(
            gocron.DurationJob(
                time.Duration(periodInSeconds)*time.Second,
            ),
            gocron.NewTask(
                job.Run, // register the job's Run() function as a new task to be executed
            ),
            gocron.WithName(jobName),
            gocron.WithSingletonMode(gocron.LimitModeReschedule),
            gocron.WithEventListeners(
                gocron.BeforeJobRuns(beforeJobRunFunc),
                gocron.AfterJobRuns(afterJobRunFunc),
                gocron.AfterJobRunsWithError(afterJobRunsWithErrorFunc),
            ),
        )

        if err != nil {
            return fmt.Errorf("job %v cannot be added to the scheduler, err=%v", jobName, err.Error())
        }

        log.Debugf("job %v added to the scheduler, with id=%v", jobName, j.ID())

    } else {
        log.Errorf("scheduler with name %v has been already registered and cannot reregister it again", jobName)
    }

    return nil
}

// SetupJobs configure the jobs 
func SetupJobs() error {

    if schedulerManager == nil {
        newScheduler, err := gocron.NewScheduler()
        if err != nil {
            return err
        }
        schedulerManager = &mySchedulerManager{
            scheduler:         newScheduler,
            jobRegister:       make(map[string]uuid.UUID),
        }

        // Create the producer job
        producerJob := &ProducerJob{}
                producerFrequencySeconds := 10

        err = addJob(producerJob, producerFrequencySeconds, producerJob.BeforeJobRunsEventListener, producerJob.AfterJobRunsEventListener, producerJob.AfterJobRunsWithErrorEventListener)
        if err != nil {
            log.WithField("notify_admin", "true").Errorf("%v", err.Error())
            return err
        }

        // Create the consumer job
        consumerJob := &ConsumerJob{}
        consumerFrequencySeconds := 60

        err = addJob(consumerJob, consumerFrequencySeconds, consumerJob.BeforeJobRunsEventListener, consumerJob.AfterJobRunsEventListener, consumerJob.AfterJobRunsWithErrorEventListener)
        if err != nil {
            log.WithField("notify_admin", "true").Errorf("%v", err.Error())
            return err
        }
    } else {
        log.Debugf("jobs have been already setup, jobRegister=%v", schedulerManager.jobRegister)
    }

    return nil
}

// StartScheduler starts all the schedulers that have been configured
func StartScheduler() {
    if schedulerManager != nil {
        schedulerManager.scheduler.Start()
    } else {
        log.Errorf("scheduler manager is nil, cannot start the scheduler")
    }
}
func main() {
    err = schedulemgr.SetupJobs()
    if err != nil {
        msg := fmt.Sprintf("failed to setup scheduler jobs, err=%v", err.Error())
        log.WithField("notify_admin", "true").Errorf(msg)
        panic(msg)
    }
    schedulemgr.StartScheduler()
}

Randomly, the ConsumerJob suddenly stops executing the Run() function. I know this because i have log statements in the even listener functions.

After i observed this behavior, to check if the scheduler has lost the job some how, i have a goroutine which is frequently printing the jobs in the scheduler and the jobs are still known to the scheduler

    // Go routine to just check the jobs registered to the scheduler and log
    go func() {
        log.Debugf("jobmonitor goroutine started")
        for {
            log.Debugf("jobmonitor goroutine checking the scheduler for jobs")
            for _, job := range schedulemgr.GetSchedulerJobs() {
                lastRun, err := job.LastRun()
                if err != nil {
                    log.Errorf("error occurred while getting LastRun() err=%v", err.Error())
                }
                nextRun, err := job.NextRun()
                if err != nil {
                    log.Errorf("error occurred while getting NextRun() err=%v", err.Error())
                }
                log.Debugf("jobmonitor goroutine found job with uuid=%v, name=%v, lastrun=%v, nextrun=%v", job.ID(), job.Name(), lastRun, nextRun)
            }
            time.Sleep(60 * time.Second)
        }
    }()

The ConsumerJob runs every 60 seconds and ProducerJob runs every 10 seconds. The ConsumerJob some times could take more than 1 minute to complete its task ( 2 minutes - 3 minutes ). The output of the above goroutine shows that the Consumer job stops running. The ConsumerJob stops running after 14:00

{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=9ca157b6-a051-4c7e-8d7c-16a61e0653b0, name=*schedulemgr.ConsumerJob, lastrun=2024-03-14 13:57:06.835146203 +0100 CET, nextrun=2024-03-14 13:58:06.835146203 +0100 CET","time":"2024-03-14T13:57:06.852672871+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=f412be28-3a62-46e8-b052-9203fd45df8b, name=*schedulemgr.ProducerJob, lastrun=2024-03-14 13:57:06.835148339 +0100 CET, nextrun=2024-03-14 13:57:26.835148339 +0100 CET","time":"2024-03-14T13:57:06.852719253+01:00"}
{"file":"/app/cmd/scheduler/main.go:67","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine checking the scheduler for jobs","time":"2024-03-14T13:58:06.852919914+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=9ca157b6-a051-4c7e-8d7c-16a61e0653b0, name=*schedulemgr.ConsumerJob, lastrun=2024-03-14 13:58:06.835146203 +0100 CET, nextrun=2024-03-14 13:59:06.835146203 +0100 CET","time":"2024-03-14T13:58:06.853084605+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=f412be28-3a62-46e8-b052-9203fd45df8b, name=*schedulemgr.ProducerJob, lastrun=2024-03-14 13:58:06.835148339 +0100 CET, nextrun=2024-03-14 13:58:26.835148339 +0100 CET","time":"2024-03-14T13:58:06.853125222+01:00"}
{"file":"/app/cmd/scheduler/main.go:67","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine checking the scheduler for jobs","time":"2024-03-14T13:59:06.854173537+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=9ca157b6-a051-4c7e-8d7c-16a61e0653b0, name=*schedulemgr.ConsumerJob, lastrun=2024-03-14 13:59:06.835146203 +0100 CET, nextrun=2024-03-14 14:00:06.835146203 +0100 CET","time":"2024-03-14T13:59:06.854312281+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=f412be28-3a62-46e8-b052-9203fd45df8b, name=*schedulemgr.ProducerJob, lastrun=2024-03-14 13:59:06.835148339 +0100 CET, nextrun=2024-03-14 13:59:26.835148339 +0100 CET","time":"2024-03-14T13:59:06.85435052+01:00"}
{"file":"/app/cmd/scheduler/main.go:67","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine checking the scheduler for jobs","time":"2024-03-14T14:00:06.854391765+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=9ca157b6-a051-4c7e-8d7c-16a61e0653b0, name=*schedulemgr.ConsumerJob, lastrun=2024-03-14 14:00:06.835146203 +0100 CET, nextrun=2024-03-14 14:01:06.835146203 +0100 CET","time":"2024-03-14T14:00:06.85452879+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=f412be28-3a62-46e8-b052-9203fd45df8b, name=*schedulemgr.ProducerJob, lastrun=2024-03-14 14:00:06.835148339 +0100 CET, nextrun=2024-03-14 14:00:26.835148339 +0100 CET","time":"2024-03-14T14:00:06.854566941+01:00"}
{"file":"/app/cmd/scheduler/main.go:67","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine checking the scheduler for jobs","time":"2024-03-14T14:01:06.9122016+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=9ca157b6-a051-4c7e-8d7c-16a61e0653b0, name=*schedulemgr.ConsumerJob, lastrun=2024-03-14 14:00:06.835146203 +0100 CET, nextrun=2024-03-14 14:01:06.835146203 +0100 CET","time":"2024-03-14T14:01:06.912899735+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=f412be28-3a62-46e8-b052-9203fd45df8b, name=*schedulemgr.ProducerJob, lastrun=2024-03-14 14:01:06.835148339 +0100 CET, nextrun=2024-03-14 14:01:26.835148339 +0100 CET","time":"2024-03-14T14:01:06.912932852+01:00"}
{"file":"/app/cmd/scheduler/main.go:67","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine checking the scheduler for jobs","time":"2024-03-14T14:02:06.913988479+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=9ca157b6-a051-4c7e-8d7c-16a61e0653b0, name=*schedulemgr.ConsumerJob, lastrun=2024-03-14 14:00:06.835146203 +0100 CET, nextrun=2024-03-14 14:01:06.835146203 +0100 CET","time":"2024-03-14T14:02:06.91416665+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=f412be28-3a62-46e8-b052-9203fd45df8b, name=*schedulemgr.ProducerJob, lastrun=2024-03-14 14:02:06.835148339 +0100 CET, nextrun=2024-03-14 14:02:26.835148339 +0100 CET","time":"2024-03-14T14:02:06.91419326+01:00"}
{"file":"/app/cmd/scheduler/main.go:67","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine checking the scheduler for jobs","time":"2024-03-14T14:03:06.914994932+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=9ca157b6-a051-4c7e-8d7c-16a61e0653b0, name=*schedulemgr.ConsumerJob, lastrun=2024-03-14 14:00:06.835146203 +0100 CET, nextrun=2024-03-14 14:01:06.835146203 +0100 CET","time":"2024-03-14T14:03:06.915287848+01:00"}
{"file":"/app/cmd/scheduler/main.go:77","func":"main.main.func2","level":"debug","msg":"jobmonitor goroutine found job with uuid=f412be28-3a62-46e8-b052-9203fd45df8b, name=*schedulemgr.ProducerJob, lastrun=2024-03-14 14:03:06.835148339 +0100 CET, nextrun=2024-03-14 14:03:26.835148339 +0100 CET","time":"2024-03-14T14:03:06.915364936+01:00"}

To Reproduce

Unfortunately i do not have the steps to reproduce this issue because this issue is occurring in random manner.

Version

v2.2.4

Expected behavior

The job should not be ignored by the scheduler

Additional context

vish-c commented 8 months ago

This issue is reported to have been fixed with release v2.2.5 and also by the latest release v2.2.6. Currently i am testing it. I see some other issues with the latest release though mentioned here https://github.com/go-co-op/gocron/issues/694

vish-c commented 8 months ago

I do not face this issue any longer after installing v2.2.6, But the issue reported in https://github.com/go-co-op/gocron/issues/694 still exists. I am closing this issue because the original problem reported in this issue seemed to have been solved.