Automattic / Cron-Control

A fresh take on running WordPress's cron system, allowing parallel processing
https://wordpress.org/plugins/wp-cron-control/
GNU General Public License v2.0
121 stars 21 forks source link

Task: Duplicate event prevention/cleanup #234

Open WPprodigy opened 2 years ago

WPprodigy commented 2 years ago

Problem: There are apparently a variety of situations where you can end up w/ duplicate events, sometimes hundreds to thousands of them. This can pretty much perpetually bog down a site until cleaned up manually.

1) Prevent duplicate recurring events

I actually think this is something WP core overlooked, and should probably revisit in core trac. Single-events are prevented from being registered if there is a similar action/args event already scheduled within +/-10mins. Docs actually say the same about recurring events, but it's not true.

Part of this is already implemented in an upcoming release, seen here: https://github.com/Automattic/Cron-Control/pull/231/commits/bbb53aeb9d83cd753f9c584b2f38c3ce91a3992c#diff-467459221a72437e5497c04a9ac91579c53ef140b9da6bb0b67c043dc1fb0490R41-R61. If an event has the same action/args/schedule, then we're just gonna deviate from core a little and prevent registration. If multiple of the same event on the same schedule is needed for some reason, then you'd just need to add some varying args when registering them.

Perhaps could take this a step further and either re-query with SRTM (send reads to master), or have a stricter unique constraint on the table (though could be tricky w/ varying event statuses).

2) Cleanup existing duplicate recurring events

There are quite a lot of sites in the wild now with duplicate recurring events, some to extreme degrees with hundreds of each. On a large MS, this means cron is going to struggle unnecessarily, and just have an overall heavy load that we can prevent.

So w/ the new contract in place where we prevent duplicate recurring events, we can then go through and cleanup any existing duplicates, as well as ones that may continue to slip through the cracks due to other race conditions. I'm thinking the daily a8c_cron_control_clean_legacy_data internal job would be great for this.

3) Handle DB unique constraint errors.

Mentioned in https://github.com/Automattic/Cron-Control/issues/147, we do need to handle the cases where events run into the unique constraint. Notably, a rare but possible situation is a single event sharing the same action as a recurring event, if the timestamps collide we need to make sure to handle it gracefully.

We'll need to document this "core deviance" in the readme.

WPprodigy commented 2 years ago

Cleaning up duplicate recurring events now on the daily cronjob, so that's good.

Next steps I'm thinking:

1) When we determine an event doesn't exist yet (performantly), do another query that locks the table while it does an additional check and then inserts it's event. This way if there are two truly parallel requests, only one will win and get the write: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html

2) The unique constraint needs to drop the status column, as I'm planning to move events to a running status and then back to pending if they are recurring (and we wouldnt want an event being added in between those). It's possible the unique constraint won't be needed anymore if we truly lock down the potential for duplicates (which would simplify moving events to other statuses like complete/failed/cancelled so we don't have to worry about the unique constraint), but if it is needed still then the index should be swapped to action/instance/timestamp in that order so more queries can take advantage of the index.