berkmancenter / amber_wordpress

Amber plugin for Wordpress
http://amberlink.org
GNU General Public License v3.0
25 stars 10 forks source link

Allow site admins to "speed up" dequeuing process to more than 1/5min #47

Open jerclarke opened 6 years ago

jerclarke commented 6 years ago

Right now Amber has a very sane and safe approach to slowly working down the queue of posts -- one at a time, once every five minutes -- which is a good default, as many sites might struggle with a heavier workload, and in the long run, most sites will be "stable" with the default slow rate of checking URLs.

The problem is that for larger sites -- like ours, which has ~100k posts with a dozen or more links each -- this may never be a stable rate of URL checking. If the volume of posts per day is high, and the volume of links per post is high, then in the long run, there is a perpetually growing queue with no warning to site administrators. This ever-growing queue gets in the way of the long-term rechecking of all URLs to determine if they have since gone offline.

Admittedly, this isn't insurmountable with the current code, as the Amber Dashboard allows us to see the queue size and, if it's growing, click "Snapshot all new links" to hopefully-quickly clear out the queue in one sitting.

That said, it would be much better if the plugin allowed site administrators to control the rate of dequeing of URLs, since in many cases the rate can be increased significantly without any performance problems, and this can permanently remove the need for administrators to worry about queue length.

Current Behavior: Hardcoded queue management configuration

The Amber WordPress plugin hooks into the WordPress cron system with an "every five minutes" schedule and executes Amber::dequeue_link() once per 5 minutes with the Amber::cron_event_hook() method during the amber_cron_event_hook action.

Both of these factors, 5 minute cron schedule and one dequeue_link() per run are currently hardcoded in the plugin, making it impossible to directly alter them, even with the expertize and time for plugin development.

This rigidity is unnecessary, and IMHO both of these values can easily be made filterable in ways that would add only a few lines of PHP to your plugin, and subsequently would allow users to alter the plugin behavior with only a few lines of PHP on their end.

Ideal behavior: Filter for cron schedule and filter for number of URLs handled per cron run

So my proposal is you add a single location in the Amber plugin where the cron schedule is determined, and in that location you use WP's add_filter() function to allow the value to be modified by plugins.

Similarly, the Amber::cron_event_hook() method should be modified to have a "number or URLs to dequeue" variable, which is filtered with add_filter(), and which is used to run Amber::dequeue_link() that many times on each run.

Finally, the documentation should be updated to point out that these filters exist for advanced users, and simple code examples should be given of their usage.

This would allow major sites like ours to solve the problem for ourselves, without requiring any additional UI that might confuse users, bloat the interface or create additional dev burden.

Performance considerations

For the sake of completeness, I'll briefly outline the reasons someone would use one or the other of these two means of increasing the amount of URLs processed.

Increase cron frequency: By filtering the cron schedule from 5 minutes (300s) to 60s

Increase URLs processed per cron run: By filtering the number of times Amber::dequeue_link() gets run. E.g. Setting it to run 5 times per run.

So for most sites, altering the cron schedule to run Amber::cron_event_hook() more often will be the best solution. It will work well for large sites with a large corpus of links, which will almost always also get regular traffic -- if only from search engines updating their caches of said corpus — to match their large URL queue in amber.

Nice to have: wp-admin setting for cron schedule

If you were going to add a UI setting inside Amber to control the rate of URL dequeing, I would definitely make it control the cron schedule, and leave the number of URLs dequeued per run to a filter as described above.

A pulldown menu with [Every 10 minutes|Every 5 minutes|Every 1 minute] in Amber Settings > Storage Settings would be extremely useful for us and probably many other sites.

Like I said though, a filter and a little documentation would work just as well for power users, who are most likely to need this option.

Thanks for your attention and for considering this request.