film42 / sidekiq-rs

A port of sidekiq to rust using tokio
MIT License
108 stars 11 forks source link

Some questions about periodic tasks #48

Open mrxsisyphus opened 3 months ago

mrxsisyphus commented 3 months ago

Hello, I have some questions about periodic tasks:

Poll Interval for Periodic Tasks

Currently, the polling interval is hardcoded to 30 seconds: https://github.com/film42/sidekiq-rs/blob/c1822b742083abbb6bcde1d7f1f15e95c486f599/src/processor.rs#L299 Is there any specific reason for this choice? I am considering the possibility of estimating an interval dynamically each time we update a periodic task using a cron expression. For instance, we could calculate the interval as the difference between the next execution time of the cron and the current time. This interval can then be recorded globally in the processor (taking the minimum value) and adjusted with specified upper and lower limits (e.g., minimum of 1 second and maximum of 60 seconds) to compute the actual polling interval needed. Or simply use ZRANGE key 0 0 WITHSCORES after enqueue_periodic_jobs to get the most recent task time. Calculate the difference between this time and the current time to determine the sleep time for the next iteration.

Store the parameters directly as keys in redis sorted set

Currently, the implementation of periodic tasks is based on a sorted set in Redis where the key represents the task parameters and the value (score) is the timestamp. Would there be any performance impact if the task has large parameters? Would it be more efficient to store an additional hash in Redis to keep the task identifiers and their parameters? I am not familiar with Sidekiq for ruby; are there any issues with this approach?

Modify the cron expression of periodic task

Currently, the API only provides the option to delete the entire periodic task and does not provide the option to delete individual one. How should we handle the modification of the cron expression for a single periodic task at runtime? I noticed there is a built-in deletion function in the scheduled task: https://github.com/film42/sidekiq-rs/blob/c1822b742083abbb6bcde1d7f1f15e95c486f599/src/scheduled.rs#L30-L39

film42 commented 3 months ago

Or simply use ZRANGE key 0 0 WITHSCORES after enqueue_periodic_jobs to get the most recent task time. Calculate the difference between this time and the current time to determine the sleep time for the next iteration.

I hard coded it to 30 seconds because the smallest cron job interval is 60 seconds, so this checks twice a minute, but, I like the idea of checking redis to see if something is expected to execute sooner than that. We can do something like min(next_time_from_redis, 30 seconds).

I noticed there is a built-in deletion function in the scheduled task:

The enqueue_jobs loop is looking for jobs inserted via MyWorker::perform_in(&redis, Duration::from_secs(10), ()). This inserts the job json into a sorted set with the expected execution time as the score. We then zrem it from the sorted set and enqueue for immediate processing. This is how open source sidekiq does it. So, it's not a delete function for the user, it's just moving a schedule job into the right queue for processing.

For periodic jobs it's a little different. We don't need to zrem because we want to restart the timer. Instead, we'll use zadd ch to reset the score to be the next target enqueue time. The ch will tell the caller if they changed the value. Whichever process sees that the score was changed (they won) they'll enqueue a copy of this job for processing.

https://github.com/film42/sidekiq-rs/blob/c1822b742083abbb6bcde1d7f1f15e95c486f599/src/periodic.rs#L151-L164

I think a positive thing about using zadd ch is that the periodic job json isn't deleted from redis, minimizing the change that a bad connection or process deletes it and fails to recreate it. But that also means if we needed to delete a periodic job, we'd need to scan through each and find the right one to remove. That's why it's usually easier to delete all periodic drops at startup and add the ones you want: https://github.com/film42/sidekiq-rs/blob/c1822b742083abbb6bcde1d7f1f15e95c486f599/examples/demo.rs#L211-L220

Would there be any performance impact if the task has large parameters? Would it be more efficient to store an additional hash in Redis to keep the task identifiers and their parameters? I am not familiar with Sidekiq for ruby; are there any issues with this approach?

No performance impact other than longer times to read the larger payload and encode/decode it. Sorted set operations are all very efficient. Both scheduled jobs and periodic jobs are using sorted sets.

mrxsisyphus commented 3 months ago

Thank you for your quick response.

I hard coded it to 30 seconds because the smallest cron job interval is 60 seconds

The cron_clock library supports second-level configuration: https://docs.rs/cron_clock/0.8.0/cron_clock/

// sec min hour day of month month day of week year
let expression = "0 30 9,12,15 1,15 May-Aug Mon,Wed,Fri 2018/2";

For instance, my test configuration of "0/2 *" is correctly parsed to execute every 2 seconds, so I think 30 seconds is not quite appropriate.

I think a positive thing about using zadd ch is that the periodic job json isn't deleted from redis, minimizing the change that a bad connection or process deletes it and fails to recreate it. But that also means if we needed to delete a periodic job, we'd need to scan through each and find the right one to remove. That's why it's usually easier to delete all periodic drops at startup and add the ones you want

Now I understand the advantage of using zadd_cn and the importance of using destroy_all. In fact, what I want to say is that I need to modify the cron expression of periodic task during program runtime instead of at startup(it might be that the backend management page needs to change the periodic task schedule) How do I handle this? Previously, I thought about deleting first and then adding. Because modifying the cron expression is equivalent to changing the key. zadd_cn is equivalent to adding a new task while the old task still exists.