BoldGrid / w3-total-cache

GNU General Public License v2.0
152 stars 85 forks source link

BUG: Object Cache is not invalidating cache for options #605

Closed jpSimkins closed 1 year ago

jpSimkins commented 1 year ago

I originally posted this in the WP support forums but feel this is most likely better to be here.

The issue:

You can jump straight to the comment I posted that shows how to duplicate the issue: https://github.com/W3EDGE/w3-total-cache/issues/605#issuecomment-1357749152

When using object-cache.php, the cron jobs are being cached and not running the proper jobs when they are due. This is also updating the database with the incorrect cron jobs once the crons are processed. Basically, it is saving to the DB the old data for the cron instead of updating it with the new runtimes as is normal behavior. This is causing single run event to constantly be ran (if they are stuck in the cached cron jobs).

I have over 40 cron jobs. Once this caching issue starts happening, I am running the same cron jobs every time the cron runs. This time, it was only 5 cron jobs that kept running. Previously, it was 20. This is random on which cron jobs will be stuck in this caching issue.

Once this happens, the only way to resolve this issue is to delete the object-cache.php file and then run cron again. This will run the same cached jobs the first time but it will properly update the cron options value so all cron jobs after work as expected.

Every time this happens, by removing object-cache.php, this will fix itself. Once I put that file back in place, then the cron will start to break again. The annoying part is that it is not immediate. This time, it was a week before the crons got mangled again. Sometimes it's 5 minutes. This happens on production and on local dev environments.

I am trying to figure out the root issue but wanted to post this here in-case others have this issue.

Please note, I am using WP CLI to run these cron jobs.

I am running wp cron event run --due-now at 5 minute intervals. With these constants:

define('DISABLE_WP_CRON', true);
define('WP_CRON_LOCK_TIMEOUT', 60);

Even though I am using WP CLI to run cron, this issue also persist when I use the URL endpoint to trigger cron manually. I am not sure if WP CLI is playing a factor here and this is why it has been so difficult to debug for me.

There is a direct correlation between the same crons running, the bad cron options values, and the object-cache.php file. Simply removing that file stops this from happening. I don’t need to change any settings, just removing that file fixes the issue. I am unaware of the consequences of keeping that file removed. I assume W3TC will not cache properly with that file removed. So I am still attempting to figure this out. Whether this is the issue or just a side-effect is unknown.

I will also add the other relevant content from the WP Support form below in-case this adds any more insight as not everything was added here. I just added what I believe is the most notable.

My original report:

W3TC is configured to only use Object Cache and Browser Cache.

We have object cache configured to use memcached with a default lifetime of 180 and GC interval of 3600. We do not cache WP Admin pages.

We have advanced-cache.php and object-cache.php in the wp-content directory.

When using W3TC Object cache, this will eventually cause most cron jobs to stop running while constantly running the same cron 5 jobs on every cron event. I can duplicate this locally.

What can we do to prevent W3TC from caching the cron options value? This seems to be the issue and I am still searching into the issue since I can duplicate this locally.

From what I am seeing, this only runs the same cron jobs every time the cron runs. We have over 20 cron jobs that are a week overdue on production. These crons run at different time intervals. From every 5 mins to once a month. I am noticing the even my single event crons are not being ran either.

Cron is invoked using wp-cli to process the cron using: wp cron event run --due-now

We even tried running cron from the URL (not ideal at all nor can we use this in production) and the exact same issue persist.

I have tried disabling object cache, purging the cache and running the cron manually (from the URL) but this doesn’t work either. I then tried to manually purge the cache directory (pointless really) but then it still doesn’t work. My only solution is to actually disable the plugin for cron to run again.

What are my options here? I am considering removing W3TC but that is a last resort and I would prefer to get this issue resolved.

Thanks


I can confirm this happens locally but it seems to take a week for this to start to happen. That is what’s so puzzling about this. I would expect this to happen immediately. I am still trying to dig into this issue but got pulled off on another task real quick. I will update this post when I find anything.

I am also trying to setup a vanilla setup for this since we do have a lot of plugins installed (mostly custom plugins for our system that won’t affect how W3TC works but we do have WPML and WooCommerce installed too). What is weird, is that I can resolve the issue by making sure the object-cache.php file is removed.

Our production system won’t remove files (other than cache) so when we were disabling the object cache, the file still existed. Once I removed the object-cache.php, then cron works as expected.

Thank you for looking into this issue. I can tell that time is a factor for this to break cron. I will update when I find out more information.


Still not able to isolate this issue. Once I remove the object-cache.php file then add it back, I need to wait some time until this happens. I will keep trying to figure this out.

I am running wp cron event run --due-now at 5 minute intervals. With these constants:

define('DISABLE_WP_CRON', true);
define('WP_CRON_LOCK_TIMEOUT', 60);

The hard part is debugging with wp cli.

As for how these crons are invoked, on production, we are using AWS ECS Scheduled tasks that runs the wp cron command on 5 minute intervals. On local development, we manually trigger the wp cron command. This happens on production and local.

On production, we need to use ECS Scheduled tasks as using the URL to trigger cron will cause issues due to autoscaling (it will kill the task running the cron and we had this issue previously).

The cron jobs affected are all cron jobs and it’s rarely the same that keep running when this happens. This is why I am assuming the cron option is being cached. After some time, it’s only the same cron jobs that keep running while all other are overdue. I then remove the object-cache.php file and then disable the object cache and this seems to recover as the cron jobs will run immediately after I do this.

I have WP Crontrol installed and when I check the list, I see almost all cron jobs are a week overdue. This time this happened, the cron jobs that kept running were:

wpseo_ping_search_engines
wc_admin_unsnooze_admin_notes
wp_privacy_delete_old_export_files
olyott-plugin-importexport-cron-queue-process
action_scheduler_run_queue

The previous time this has happened, this was completely different crons that kept running.

On local, I have tried to use the URL to invoke cron and the same issue is happening. Since I removed the object cache file, it takes some time for this to happen again. Once this happens again, I can debug using the URL for the cron process. I am waiting for this to happen again to get more information. Until then, it’s impossible to duplicate the issue.

One more thing that I am noticing now that I am paying attention to the cron system. When this issue happens, it seems that when the crons do run and object cache is enabled, I am seeing single run crons still showing after they ran. Meaning, that crons that ran do not appear to be shown as being updated for the next run time. When checking the database, I see them in the cron options table too. That is interesting. When I disable the object-cache.php file, when they run again, they do get updated properly and the single run crons are removed properly.

Let me know if you need more information.

Thanks again


To add more details about what is happening:

When I run the cron, the same crons are running over and over again on each cron interval. When I remove the object-cache.php file, then run the crons, they work as expected and update the cron options value appropriately (all cron timestamps are updated and single run crons are properly removed).

If I add the object-cache.php file back, then run cron again, it will run the same cron jobs as before I removed the object-cache.php file and then it will update the cron option with the cached values it already ran with. This is adding back into the cron setting, single run crons that have already ran as well as adding back in bad next runtimes. This is also causing all crons to show as overdue in WP Crontrol page.

It would seem (cannot check this in code yet), that the object-cache.php file, for whatever reason, is only using the cached value of the cron options value and not fetching it appropriately. Given my default cache lifetime is 180 I would expect it to expire after that time. I do have the Store transients in database unchecked. I wanted to make note of this for now.

There is a direct correlation between the same crons running, the bad cron options values, and the object-cache.php file. Simply removing that file stops this from happening. I don’t need to change any settings, just removing that file fixes the issue. I am unaware of the consequences of keeping that file removed. I assume W3TC will not cache properly with that file removed. So I am still attempting to figure this out.

I tried to add config to the Non-persistent groups since this is supposed to ignore caching for this group, but that has no affect.

I will update more as I discover more about this issue.

jpSimkins commented 1 year ago

Further investigation shows that this is not an issue with wp-cli. I can see that it is updating the cron value in the options table properly. When I refresh any page (frontend or backend) of the site this is now reverting the cron settings.

Still not sure why this is only a factor when the object-cache.php file is in place but it is something I am seeing. Adding this here for now.

jpSimkins commented 1 year ago

Ok, so I found the issue.

The code below is the cause of the problem. Although this is what is causing the cron options to be reset, this still seems like an issue... this isn't expected behavior. So I still think this is a bug.

All that is doing is removing a scheduled event from happening. For some reason this is causing the cron options to be reverted back to the original settings.

I am going to play with this in a vanilla sandbox to see what I can figure out about this. I suspect that it is using a cached version of the cron and when the schedule is removed, then it is saving back the cached cron value. This should not be possible. I would assume that when the cron options are updated, that the older caches would not be able to be loaded again. This still points me to believe this is an issue with W3TC caching.

I will continue to look into this.

class YoastSEO {

    public function __construct() {
        add_action('init', [$this, 'removeCron'], 11);
    }

    /**
     * This is to stop Yoast from resetting the permalinks breaking the english rewrite rules.
     * @return void
     */
    public function removeCron() {
        if (!\wp_next_scheduled('wpseo_permalink_structure_check')) {
            return;
        }

        \wp_clear_scheduled_hook('wpseo_permalink_structure_check');
    }

}

UPDATE:

So looking into this a bit deeper. wp_clear_scheduled_hook will eventually call get_option( 'cron' ) and when it does, the value returned is the old cache, not the new cache. This is then calling update_option via _set_cron_array which is saving the bad data back into the database. This has been causing all my headaches.

I don't think it really has to do with the code above exactly. If you just use:

$cron = get_option( 'cron' );
update_option( 'cron', $cron );

This should cause it. I'll look into this more on Monday.

Given this seems to be an issue with the cron option not being expired/incremented properly, I still believe this is a bug with W3TC. Unless you have other suggestions.

Please advise. Thanks!

jpSimkins commented 1 year ago

I setup a vanilla WP setup with just W3TC and my test plugin that is:

class TestingCronBug {
    public function __construct() {
        add_action('init', [$this, 'process'], 11);
    }

    /**
     * Proving bad cache is being saved back
     * @return void
     */
    public function process() {
        $cron = get_option('cron'); // this is the value of the expired cache
        update_option('cron', $cron);
    }
}

How to duplicate:

  1. Setup vanilla WP
  2. Install/Activate W3TC
  3. Setup Object-cache using memecached
  4. Navigate the site a few pages to ensure the cache is set
  5. Activate test plugin
  6. Check the DB: SELECT * FROM wp_options WHERE option_name = 'cron' (see initial cron runtimes)
  7. Run cron (I am using wp cli)
  8. Check the DB: SELECT * FROM wp_options WHERE option_name = 'cron' (see updated cron runtimes)
  9. Reload the wp-admin dashboard
  10. Check the DB: SELECT * FROM wp_options WHERE option_name = 'cron' (see the data was rolled back to the old value)

The issue is that get_option( 'cron' ); is loading the old cache result, not the new values. It would seem that when the cron values are being updated, there is no expiring or incrementing of the cache group. Therefore the old value is being returned from cache.

I think this is enough to prove the bug as I can reliably duplicate this.

Proposed Solution:

UPDATED

Remove $this->_do_flush() check as this is the issue.

        if ( $this->_do_flush() ) {
            add_action( 'updated_option', array( $this, 'on_change_option' ), 0, 1 );
            add_action( 'added_option', array( $this, 'on_change_option' ), 0, 1 );
            add_action( 'delete_option', array( $this, 'on_change_option' ), 0, 1 );
        }

This is the only code in W3TC that purges the cache for options and it is wrapped in a $this->_do_flush() check. This check is not necessary for this as we want the cache to be deleted when the option is changed. With this check being used, there is no way to purge the option cache. This check seems useless as it won't purge the options on the flush event unless the option is being updated, which it isn't for most options. You actually need different logic to purge the cache when doing a flush as this won't work as you expect. I would assume a flush cache should be used when I am flushing the cache. Again, this check for the options is pointless and should be removed.

I was able to fix this issue locally using:

class TestingCronBug {
    public function __construct() {
        add_action('init', [$this, 'process'], 11);
        add_action('update_option_cron', [$this, 'fixCronOptionCacheIssue'], 11);
    }

    /**
     * Proving bad cache is being saved back
     * @return void
     */
    public function process() {
        $cron = get_option('cron'); // this is the value of the expired cache
        update_option('cron', $cron);
    }

    public function fixCronOptionCacheIssue() {
        wp_cache_delete('cron'); // hack and not a proper fix, just showing this solves it
    }
}

Concerns:

I am concerned if this is affecting other options in WP (confirmed it is). This is an issue I am seeing with the cron option but this could possibly be doing this to all options. This is why I feel this needs to be looked into by W3TC

jpSimkins commented 1 year ago

The only code I see in W3TC to flush the options cache is:

        if ( $this->_do_flush() ) {
            add_action( 'updated_option', array( $this, 'on_change_option' ), 0, 1 );
            add_action( 'added_option', array( $this, 'on_change_option' ), 0, 1 );
            add_action( 'delete_option', array( $this, 'on_change_option' ), 0, 1 );
        }

Which is wrapped in a _do_flush method. This shows me that when you are adding new, updating, or deleting options, there are no cache changes unless you explicitly perform a flush cache.

I would expect it to be any update to the options should delete the cache for that option. Exactly the same as it does with Posts or Taxonomies. When the data is changed, it should flush the cache for the item respectively.

For my case, this happened to be the cron option but this is happening to all options. This is a big issue with the cache object imo.

Could this be a bug in WordPress? I would actually assume WP to perform these cache deletions.

UPDATE:

Not an issue with WP, this is a logic issue with W3TC.

This seems to be the actual issue. Since this is wrapped in $this->_do_flush() check, this will never be reached. Meaning, the options cache is never purged unless you are performing a flush of the cache. This should not be wrapped in the $this->_do_flush(). This should always be done when the option is updated. The only fix you need is to remove the $this->_do_flush() check and let this properly purge the cache when the option is updated.

jpSimkins commented 1 year ago

For now, to be able to use W3TC, I wrote this to get this to work as expected:

/**
 * Cache Integration for options
 * See bug report: https://github.com/W3EDGE/w3-total-cache/issues/605
 * This is to ensure that option cache is deleted when expected
 */
class Options {

    public function __construct() {
        add_action('updated_option', array($this, 'delete_cache'), 0, 1);
        add_action('added_option', array($this, 'delete_cache'), 0, 1);
        add_action('delete_option', array($this, 'delete_cache'), 0, 1);
    }

    public function delete_cache($option) {
        wp_cache_delete($option);
    }

}
jacobd91 commented 1 year ago

Thank you for taking the time to detail this issue and even propose a solution. We greatly appreciate customer feedback and contributions. As such we are currently working towards our January patch release but intend to investigate this for the February release with priority. Thanks!

jamieburchell commented 1 year ago

@cssjoe Does this mean we should disable the object cache if we are using cron until this fix is released? I haven't noticed issues with cron jobs not running.

cssjoe commented 1 year ago

@jamieburchell This issue refers to cron job settings in the database not being used as the old settings stored in the object cache are being used instead. If you see problems with WP cron events, you can try disabling the object cache and wait for the next patch release. We are still performing more testing before the patch release.

NOOBBase commented 8 months ago

Hi all,

Seeing that this topic is ongoing for quite a while now. Any news on it. We have the same issue on all our websites and are very eager to have this resolved.

cssjoe commented 8 months ago

@NOOBBase We will be publishing the next release early next week. Please review the fixes cited in this discussion and let us know if you have any concerns. Thank you.

NOOBBase commented 8 months ago

@cssjoe, no, no concerns. Looks good to me. :)

NOOBBase commented 8 months ago

@cssjoe I have one concern after some testing.

This issue happens even when I have object and database caching disabled. It seem to happen when we activate the plugin and caching option by default.

Can you confirm, confirm this fix is an overal fix for the cron issue?

NOOBBase commented 8 months ago

@cssjoe The only thing that still concerns me using your plugin now is that the issue was raised more then a year ago. How can a bug live this long? In all my companies this was never really acceptable. Especially when the bug is crucial like this one. Can you explain a bit more why it can happen that it takes this long to fic this?

Because many people pay for this plugin.