backube / volsync

Asynchronous data replication for Kubernetes volumes
https://volsync.readthedocs.io
GNU Affero General Public License v3.0
587 stars 67 forks source link

Support for randomness / hashed substitution in a trigger schedule #1421

Open adampetrovic opened 3 weeks ago

adampetrovic commented 3 weeks ago

Describe the feature you'd like to have. Currently the trigger schedule directive on ReplicationSource supports a crontab style schedule or a static directive such as daily, yearly etc.

For those that are using a templatized ReplicationSource, many PVCs will inherit the schedule from a single template and therefore have the same trigger time, leading to many backup operations being triggered at the same time.

See example of such a template here: https://github.com/adampetrovic/home-ops/blob/main/kubernetes/templates/volsync/minio.yaml#L34

Some cron implementations use H as a hashed substitution for a value:

'H' is used in the Jenkins continuous integration system to indicate that a "hashed" value is substituted. Thus instead of a fixed number such as '20 ' which means at 20 minutes after the hour every hour, 'H ' indicates that the task is performed every hour at an unspecified but invariant time for each task. This allows spreading out tasks over time, rather than having all of them start at the same time and compete for resources

With this approach, volsync could take the name of the job as the hash value to evaluate to say a minute within the hour.

What is the value to the end user? (why is it a priority?)

Better load spreading of backups

How will we know we have a good solution? (acceptance criteria)

A user is able to defer random execution / jitter of backup times to the volsync operator instead of having to manage it themselves.

Additional context

tesshuflower commented 2 weeks ago

This is an interesting one, potentially something could be done to add a simplified version of this - most likely by:

I don't want to promise this, but sounds like a neat idea if there is time to work on it.

@adampetrovic It looks like the Jenkins implementation may allow for more complex substitutions such as H/10 or H(1-3)?

adampetrovic commented 2 weeks ago

Ideally this is something delegated to the cron library and all volsync needs to worry about is passing through the job name for hashing.

Looking at https://github.com/robfig/cron, the last PR merged was 4 years ago, so I'm not going to blindly assume that route will be a fruitful one.

@adampetrovic It looks like the Jenkins implementation may allow for more complex substitutions such as H/10 or H(1-3)?

Yes it does, again it feels off though to have to think about implementing custom cron parsing in volsync.

If volsync were open to using a fork of the cron library, I could have a go at implementing this.

adampetrovic commented 2 weeks ago

I have implemented basic support (just standard H and steps H/20) in my fork, as a reference.

I have extended the API to add a function ParseWithJobName which allows for a job name to be parsed, which is used for consistent hashing to the same value.

https://github.com/adampetrovic/cron/commit/4c91fead6ecc480e7977a46700d68cf7211780dc

tesshuflower commented 2 weeks ago

I think my initial reaction is I'd rather not point at a fork, but let me think about it a bit.

With a fork:

If we put hashing code into volsync, we could also show the computed schedule in the status, so a user would be able to see what the hash resolved to which might be kind of nice. Disadvantage is it's not so clean and contained in cron parsing code.

bo0tzz commented 2 weeks ago

Rather than forking, what about switching to a different cron library that's more actively maintained? If they are willing to add hash support, of course.

onedr0p commented 2 weeks ago

I couldn't find an existing library that supports setting a jitter in the cron format. I have to agreed that the jenkins cron format would be ideal.

The author of robfig/cron was asking about to adding H support but closed the issue 😢 Judging by the authors Github activity they might not be coming back to support the project.

@tesshuflower Another option could be (if upstream is truly abandoned) that backube could fork robfig/cron and support what's requested here in the library. I get that adding another project to your org might stretch your resources a bit so I totally get why you wouldn't want to; however, if adding this jitter / randomness / hashed substitution helps with both volsync and snapscheduler getting this feature maybe it can be considered?

adampetrovic commented 2 weeks ago

Another option could be (if upstream is truly abandoned) that backube could fork robfig/cron and support what's requested here in the library

I was going to suggest the same thing.

Given it's 4 years since the last commit I can't imagine the maintenance overhead being large. I'd be happy to volunteer my time to help with this, if needed

tesshuflower commented 2 weeks ago

Thanks for the offer @adampetrovic but perhaps we just put the specific hashing code into VolSync itself - this gives us more control and we can still keep it separate from the actual cron implementation.

Steps:

This is something I can help with when I have some time - I'll take a look at your hashing function at some point and can perhaps use that (or I can get you to contribute it directly here if you prefer).

Questions:

adampetrovic commented 2 weeks ago

What range of use-cases would we want to support when it comes to H?

I think the primary set of features that cron already has like:

adampetrovic commented 2 weeks ago

Are you sure you want to do this in volsync? There's a tonne of parsing / repacking unrelated to the Hash feature that would need to be implemented from scratch that already exists in the cron implementation.

For what it's worth, I've now also added support for hashes with a range in my fork.

tesshuflower commented 2 weeks ago

I probably wasn't clear - if we support hashing, we'd just parse the hash values out, compute and put into the status - then scheduling would look at that resolved cron schedule (e.g H * * * * resolved to 20 * * * *) and figure out the next scheduled time using the existing cron parsing code.

adampetrovic commented 2 weeks ago

Yes, I am aware :)

Volsync would need to basically parse all possible permutations of a cron expression into individual fields and sub-fields within those fields as well as worry about the specifics of the Hash expression. Not impossible by any means, just a lot of edge cases to deal with for something that volsync shouldn't be directly concerned with (IMO).

For example:

0,1,H(5-10)/2,30 * * * * * involves parsing each field (split by <space>), then within each field split by , and then the mechanics of the hash implementation, then replacing the H(5-10)/2 into a comma separated list of values coming out of the hash calculation.

adampetrovic commented 2 weeks ago

I also forgot that the parsing function would need to know which field it is parsing (seconds, minutes, hours etc) in order to generate a valid set of values within a range.

tesshuflower commented 2 weeks ago

agreed, it's not super simple, which I think is why we'd want to keep the set of use-cases we support for H to be constrained to something that would be useful to you, at least at first.

adampetrovic commented 2 weeks ago

For me just a simple H would suffice. For others I can see a use case for wanting to restrict the output to a certain range (e.g first 30 minutes in an hour) with H(0-30)