Closed hatsu38 closed 4 months ago
This is expected, as mentioned in the README:
Idempotency of Task#process: it should be safe to run process multiple times for the same element of the collection. Read more in this Sidekiq best practice. It’s important if the Task errors and you run it again, because the same element that caused the Task to give an error may well be processed again. It especially matters in the situation described above, when the iteration duration exceeds the timeout: if the job is re-enqueued, multiple elements may be processed again. https://github.com/Shopify/maintenance_tasks/blob/v2.7.1/README.md#considerations-when-writing-tasks
But really you should investigate why your Sidekiq processes are forcibly terminated. Typically you want to send SIGTERM and have a grace period longer than the time to process a single element. That way when the element is processed, the job notices the worker is shutting down, and stops iterating, recording the cursor, and re-enqueues itself. When a worker picks up the job, the task is resumed exactly where it stopped.
Summary
I am running MaintenanceTasks with Sidekiq. But Sidekiq sometimes crashes unexpectedly.
This issue aims to ensure that tasks can resume from the cursor position at the time of termination, even when Sidekiq or other processes are abruptly terminated with SIGKILL.
Details
Here is an example of the data stored in the maintenance_tasks_runs table:
Current Behavior
The cursor is used to indicate the position from which a task should be resumed after being interrupted. However, the current implementation relies on the shutdown callback to update the cursor.
If Sidekiq is forcibly terminated or an exception occurs, this callback does not execute, and the cursor remains outdated. Consequently, the task resumes from an old cursor position instead of the latest position.
Requested Change
I thought I could solve this problem by executing
@run.cursor = cursor_position
within theeach_iteration
method.