Miksus / rocketry

Modern scheduling library for Python
https://rocketry.readthedocs.io
MIT License
3.26k stars 105 forks source link

ENH Support setting time variability (randomized offsets) #100

Open Murplugg opened 2 years ago

Murplugg commented 2 years ago

Is your feature request related to a problem? Please describe. Hi! I'm thinking of using this library as part of a scheduler / manager to automate various tasks including web-scraping and running parts of a multi-agent system. I have some tasks that either shouldn't or don't need to be run at exactly defined moments (e.g. web-scrapers and randomized load balancing) and it would be very helpful to be able to set a spread or error bar to a time / trigger condition. I.e. set some acceptable leeway that's non-fixed.

Describe the solution you'd like Specifically I want to introduce non-fixed acceptable leeway -- randomly sampled error from a range -- to defined start- and end-times in a way that has the errors change each time a condition is evaluated (not set the error once and re-use it). This should be handled under the hood by Rocketry to minimize boilerplate code / code complexity.

In practice what I envision is one or more of the following extensions:

  1. To use an optional language-based syntax that immediately follows a time. I'm not yet sure what it should look like but I'm thinking along the lines of 
@app.task("daily after 07:00 +/- 10 minutes") and "daily after 07:00 +/- 2 hours. Would apply to the Condition API as well: every("10 seconds +/- 2 seconds").
  2. To pass the spread as an optional argument. This avoids adding complexity to the language parser but we lose granularity on which times to affect -- it becomes more natural for the spread arg to affect all times defined in the string.
  3. Specifically for the Condition API it might be an option to add a spread or error attribute to the members of rocketry.cond (after, between, on, ...).

Choice of sampling method used to set the final offset can in theory also be a user choice but I'm not sure where that parameter is best defined -- it adds more complexity to the language syntax. Passed as an argument or maybe set on the Session beforehand might be better options. I see two viable sampling methods: Uniform random and Gaussian.

Then there's a question of the spread size. It makes no sense to say "+/- 1 day" or "+/- 7 hours" in the above examples, so there should perhaps be some constraint handling in place. Exactly which ones is unclear.

Describe alternatives you've considered I have considered dynamically setting or adjusting times on a per run basis using my own code external to Rocketry. Functionally this will likely be fine but it adds complexity to my code and I believe others might enjoy this feature too. It's unclear to me how I can implement variation on the timing in a way that's handled under the hood by Rocketry. I know the request might seem odd in the face of Rocketry trying to be as precise and timely as possible but I like this library and I see my suggestions here as an extension to fill a niche.

Additional context I had a look at https://github.com/Miksus/rocketry/issues/89#issuecomment-1234689610 and like the idea presented there but as far as I see that fulfills a different need (i.e. setting a probability of running a task at all, time still being exact). I would love to provide a similarly small example but haven't yet figured out the internals of Rocketry to do so.

Miksus commented 2 years ago

Thanks for the thorough and well-thought issue! Especially your proposal for the condition syntax looks awesome: clean and understandable. This was very insigtful.

Some Random Thoughts

I'm pretty tired at the moment and probably need to reread what you said later but I personally think Rocketry should support randomized runs. I'm currently doing some fixes to the parametrization so that it supports setting parameters for a manual run. After that I probably have time to investigate this more.

You said this already but to recap for myself, there are at least two related needs:

I think there are three things that must be defined:

  1. Period: time in which the randomness takes place, ie. in the next hour. This must be set as otherwise saying "run this randomly once" could mean it will run after the next 200 years.
  2. Distribution: the default should probably be uniform distribution but could be interesting that we made it in a way that this could be changed. Perhaps allowing passing Scipy's distributions but it must be implemented in a way that Scipy is not in the dependencies
  3. When the point is decided. I think it must be decided beforehand. If the condition is stateless and if the condition is checked 10 times per second, it most likely is that the probability is skewed on the near side.

What comes to the implementation, I think this could be done in a way that the time period is simply just dynamic. Basically the problem could be solved in a way that we utilize TaskExecutable or TaskRunnable classes to handle checking if the task has finished/runned and then we simply pass a period that changes randomly.

Example

Let's take an example, like this daily at 07:00 +/- 2 hours. This can be:

Note that at basically means one full child unit. For daily it means one hour, for hourly it means one minute etc. So basically the start is something in between 05:00 and 07:00 and the end is something in between 08:00 and 10:00. We just need to generate random values between these, then create a period out of it (simple) and pass that to TaskExecutable/TaskRunnable. Then we also need to do the previous again after this period is over. Doesn't sound too bad and actually something that fits rather nicely to what we already have. I think the units (ie. +/- 2 hours) should always just be smaller than the period we work with: daily supports +/- 2 hours, +/- 2 minutes, +/- 2 seconds etc. Possibly at first we could make it support just the resolution which is already stored in the periods (ie. hour for daily).

I'll revisit this latest in the weekend as, as I said, I probably should be sleeping already. Thanks again for the effort you put for writing the issue.