[FEATURE] - Can gocron read from a persistent storage the scheduled jobs and schedule

varsraja commented 1 year ago

Is your feature request related to a problem?

We are running gocron in a containerized environment where jobs are scheduled based on rest api inputs. We would like to use some sort of persistence to store the received jobs details like frequency etc. When the container restarts / the instance where container is running restarts we would like the gocron scheduler to fetch the existing job status and schedule the next run of each job accordingly.

Describe the solution you'd like

Solution would be that on a restart , the scheduler will be able load data from persistent storage and continue the subsequent runs as it would have if system hadnt rebooted/restarted. The scheduler would update the last run time in the persistent storage so that on fresh startup, it can recalculate the next run time.

Describe alternatives you've considered

Have a wrapper function, which would be responsible for updating persistent storage on when the run was last scheduled. On fresh restart , it would load the scheduler by recalibrating the start time from the last run time and add the jobs.

Additional context

varsraja commented 1 year ago

Any update on this ?

JohnRoesler commented 1 year ago

@varsraja I'm not opposed to having this feature in gocron. If it were to be done, I'd like to have an interface that multiple databases could be implemented for (redis, etc.).

The implementation in gocron, I would think something like:

scheduler starts, loads entries from storage, and loops through updating the lastRun time of the jobs, and the nextRun if the nextRun is in the future (i think if it's in the past, it would be then set to zero time, and the scheduler would pick the right next run time)
each time a job is run, the scheduler would write the job's run details to the store

What details would be required to be stored? A unique identifier for the job - use the job Name field that the locker uses. The last run and next run times of the job.

varsraja commented 1 year ago

@JohnRoesler Appreciate the response. I was also looking for something along the lines you have suggested. A unique job identifier would be required if we need to have a primary key in database, last run, next run details as well. Basically, the go.Cron Job details would be required to be stored, I guess.

JohnRoesler commented 1 year ago

Good news, the latest release added uuids for jobs, so we now have unique identifiers. The fields on the job struct are private, so we'll just want to make a new struct, something like JobStorage, that has public fields for the things that will need to be stored. Then when the job occurs, we could instantiate a JobStorage struct and save it to the database via the interface that will be defined.

JohnRoesler commented 1 year ago

Hm, or perhaps, since the interface will be within the gocron project, it can just accept a Job and then convert it to a JobStorage object for a sql implementation 🤔 Some poking around at the implementation will help flush out some of these details

varsraja commented 1 year ago

Is it possible to store as file (json dump) as well apart from databases. Is there a rough estimate on how much days this could take for implementation. I would love to experiment with the initial cut.

JohnRoesler commented 1 year ago

Is it possible to store as file (json dump) as well apart from databases.

Certainly. I think the beauty of the interface is that it can implemented in whatever way you'd like. As far as the implementation, please do have a go if you'd like!

tricknife commented 1 year ago

looking forward to this feature. I'm using tag and uuid to manage jobs, which is very inconvenient.

4zore4 commented 10 months ago

Is it possible to store as file (json dump) as well apart from databases.

Certainly. I think the beauty of the interface is that it can implemented in whatever way you'd like. As far as the implementation, please do have a go if you'd like!

I wonder if I can try to contribute to this？

JohnRoesler commented 10 months ago

@4zore4 if you are interested in contributing - let's look at adding it to the v2 branch (as that's the future 😄)

4zore4 commented 10 months ago

看看

Ok, I will try to add this feature in the v2 version

JohnRoesler commented 10 months ago

@4zore4 I think having a separate struct for the job loading - that isn't the internalJob or public Job would be best. You'll need to consider which fields from the job are important to store/load

My initial thoughts on on what you need and don't need from the internalJob

type internalJob struct {
-   ctx    context.Context~
-   cancel context.CancelFunc~
+   id     uuid.UUID
+   name   string
+   tags   []string
+   jobSchedule
+   lastRun, nextRun   time.Time
+   function           any
+   parameters         []any
-   timer              clockwork.Timer
+   singletonMode      bool
+   singletonLimitMode LimitMode
~   limitRunsTo        *limitRunsTo // for this to be useful, you'd also have to store the # of runs 
                    // when the scheduler is shutting down
~   startTime          time.Time // this isn't useful beyond the initial run
~   startImmediately   bool // this isn't useful beyond the initial run - but if you set
                    // start immediately, would you want your job to also start
                    / /immediately when a new scheduler pod started? I don't
                    // think so, you'd want it to continue as close to where it left
                    // off as possible.
    // event listeners
+   afterJobRuns          func(jobID uuid.UUID)
+   beforeJobRuns         func(jobID uuid.UUID)
+   afterJobRunsWithError func(jobID uuid.UUID, err error)
}

Another thing we need to make sure is handled - is when scheduling the next run, if the lastRun is far enough in the past that the next run is also in the past. I don't think v2 handles that yet.

4zore4 commented 10 months ago

Your idea is very good, as you said whether to execute expired tasks or not, I feel that this choice needs to be given to the user.

I'll try to write a demo at the end of the week, if I'm not lazy.

It is worth mentioning that I have used this library in my company's projects. Thank you very much for your contribution

kyriakid1s commented 8 months ago

Any update on this?

kyriakid1s commented 8 months ago

@4zore4 I think having a separate struct for the job loading - that isn't the internalJob or public Job would be best. You'll need to consider which fields from the job are important to store/load

My initial thoughts on on what you need and don't need from the internalJob

type internalJob struct {
- ctx    context.Context~
- cancel context.CancelFunc~
+ id     uuid.UUID
+ name   string
+ tags   []string
+ jobSchedule
+ lastRun, nextRun   time.Time
+ function           any
+ parameters         []any
- timer              clockwork.Timer
+ singletonMode      bool
+ singletonLimitMode LimitMode
~ limitRunsTo        *limitRunsTo // for this to be useful, you'd also have to store the # of runs 
                  // when the scheduler is shutting down
~ startTime          time.Time // this isn't useful beyond the initial run
~ startImmediately   bool // this isn't useful beyond the initial run - but if you set
                  // start immediately, would you want your job to also start
                  / /immediately when a new scheduler pod started? I don't
                  // think so, you'd want it to continue as close to where it left
                  // off as possible.
  // event listeners
+ afterJobRuns          func(jobID uuid.UUID)
+ beforeJobRuns         func(jobID uuid.UUID)
+ afterJobRunsWithError func(jobID uuid.UUID, err error)
}

Another thing we need to make sure is handled - is when scheduling the next run, if the lastRun is far enough in the past that the next run is also in the past. I don't think v2 handles that yet.

Hello, happy new year,

About the startTime field, i believe that it's useful to add it to jobStorage struct, cause if a user want to execute it (with OneTimeJob function ) in 2 hours and scheduler shut down in this period, the job will be lost.

I'm trying to implement this feature on this project, i am not a very experienced programmer but, you know, i'm trying

JohnRoesler commented 8 months ago

Another thought - to make storing it the simplest - I think looking into converting the job export structure to some sort of string could be worth while. Then the export would be to a string and it would import from a string and decode that string into jobs. Or slice of strings...so it's not really long in the event of many many jobs.

kyriakid1s commented 8 months ago

Any thoughts about how saving the function? I am thinking about saving only the function name.

4zore4 commented 8 months ago

Reference in new i

Sorry, I haven't updated it yet, because the company is busy near the end of the year. But I think your idea is great and consistent with mine, and I have implemented the demo.

func Test_Job(t *testing.T) {
    var redisJob redisJob_test.RedisJob
    methodmap := initFun(redisJob)
    j, _, _, _ := newJob(redisJob.TestReflect, methodmap)
    // each job has a unique id
    fmt.Println(j.ID())

    for {
    }
}

func newJob(function func(), methodMap map[string]reflect.Value) (gocron.Job, error, string, map[string]string) {
    methodMap1 := make(map[string]string)

    // Gets a pointer to a function
    funcPtr := reflect.ValueOf(function).Pointer()

    // Gets the name of the function
    funcName := runtime.FuncForPC(funcPtr).Name()
    fmt.Println(funcName)

    for methodName, _ := range methodMap {
        if strings.HasPrefix(funcName, methodName) {
            methodMap1[funcName] = methodName
        }
        fmt.Println(methodName)
    }

    s, err := gocron.NewScheduler()
    if err != nil {
        // handle error
    }
    j, err := s.NewJob(
        gocron.DurationJob(
            10*time.Second,
        ),
        gocron.NewTask(function),
    )
    s.Start()

    return j, err, funcName, methodMap1

}

func initFun(redisJob redisJob_test.RedisJob) map[string]reflect.Value {
    methodMap := make(map[string]reflect.Value)
    objValue := reflect.ValueOf(&redisJob)

    objType := objValue.Type()

    for i := 0; i < objType.NumMethod(); i++ {
        method := objType.Method(i)

        funcPtr := method.Func.Pointer()
        methodValue := objValue.MethodByName(method.Name)

        // Use the runtime package to get the name of the function
        funcName := runtime.FuncForPC(funcPtr).Name()
        methodMap[funcName] = methodValue
        log.Println(funcName)
    }
    return methodMap
}

kyriakid1s commented 8 months ago

What fields the redisJob has ?

pcfreak30 commented 7 months ago

I am also very interested in this and may end up implementing it with gorm/mysql. Need a background task queue that can survive shutdowns and be distributed long term.

pcfreak30 commented 6 months ago

I have been thinking about this while working on other components in my project, and while the example @4zore4 uses pointers, it will not work at scale IMHO.

My thought jumped to using a wrapper package on a scheduler, which I already have, to use the lock and elector system and manage all jobs.

You must create many job names or types and register them to task functions that handle them. You could then store this, load all tasks up on boot, and go where you left off. You can't store the job struct data in memory, especially with multiple nodes running (function pointers). So, some "job manager" abstraction is needed for this.

I'm open to thoughts on how this might be designed, but I'll likely end up with an MVP for my needs, which is at least MIT, so others can use it as an example before I put any effort into making it reusable.

@4zore4 @JohnRoesler @varsraja

pcfreak30 commented 2 months ago

I thought I would provide an update. I have forked go-cron some and can prob create a PR soon with the change, (added WithIdentifier to set UUID).

But... I have implemented a cron system abstraction here https://github.com/LumeWeb/portal/blob/e44bd0f59300b2d7ee164cef4714543639a65c48/service/cron.go.

Overall I think it makes the most sense to just create a layer on-top vs try to make the library directly support it.

Kudos!

JohnRoesler commented 2 months ago

@pcfreak30 thanks for sharing that! Yes, I agree with your sense that having it be separate would be the best. Then it can wrap gocron as the core scheduling library without introducing a bunch of complexity that many won't need/use.

JohnRoesler commented 1 month ago

For the purposes of restoring jobs from a data store, I think we'll need a method, perhaps a JobOption(s) that supports setting attributes of the job, such as LastRun and NextRuns, similar to the newly added WithIdentifier that allows setting the UUID of the job. Yay/nay?

pcfreak30 commented 1 month ago

See https://github.com/LumeWeb/portal/blob/e0caec59acc68a5be80535add4b1b9f32747e0dd/service/cron.go#L94 for inspiration on what I am doing atm.

If you have any thoughts on how your idea or another could refactor this code to be better, im all ears :).

go-co-op / gocron