Automattic / wp-memcached

Memcached Object Cache for WordPress.
https://wordpress.org/plugins/memcached/
GNU General Public License v2.0
157 stars 55 forks source link

Widgets - data loss #78

Closed shanemarsh28 closed 2 years ago

shanemarsh28 commented 2 years ago

Have you had any reports of data loss concerning widgets? We have had two sites in quick succession report issues where sections of widgets under Appearance->Widgets have been apparently deleted. I can confirm this was not user error. We have deactivated Memcache as a precaution.

We are on the latest version 4.0.0 and Wordpress 5.8.1. We also run HyperDB v1.7.

tomjn commented 2 years ago

@shanemarsh28 unless it ran into the upper limit of the cache buckets, this is still an object cache, so widgets would still be stored in options, there's no mechanism for data loss here that I can see that isn't ephemeral.

If you did run into the upper size limit of the cache buckets, e.g. you had megabytes of widget data and no mitigations, I can foresee it could not store it in memcached, though my expectation would then be degraded performance, or at a stretch out of date data, not data loss, and it would still be storing it using the classic method. Object caches suplement, they don't replace with the exception of transients

shanemarsh28 commented 2 years ago

Hi Tomjn,

Thanks for replying. It is very strange because the data in both cases was for a selection of widgets was reduced to an empty array. It was almost as if there was a serialization error on an update command but I've found no errors, nothing in the access logs, nothing in PHP logs and no debug trail. The MySQL table did not crash. I will post the same question in hyperDB because beyond that everything else is stock Wordpress. I can't see how it can be anything else other than code that directly interferes with the cache or saving process.

To note we have seen serialisation errors in the memcache process that has required a manual deletion of keys to resolve, cases where network activated plugins don't deactivate correctly and various other issues but I didn't deem them to be the same as this because as you suggest, it was only the cache that was out of date - the database storage was correct.

tomjn commented 2 years ago

I'd note that memcached doesn't serialise classes, PHP does that, you'll need to be more specific but serialising PHP classes into strings that then produce mangled broken results when deserialised is more likely an issue with the classes and data being serialised ( unless truncation is occurring due to aforementioned max size issue ). If it is indeed a size issue, then that's not something you need to guess about, it can be verified and measured with hard numbers

tomjn commented 2 years ago

of note, VIP has docs as clients can run into it there https://docs.wpvip.com/technical-references/code-quality-and-best-practices/working-with-wp_options/#h-identify-and-resolve-problems-with-alloptions

In particular https://docs.wpvip.com/technical-references/code-quality-and-best-practices/working-with-wp_options/#h-widgets-are-stored-in-wp_options, widgets don't scale as a content storage medium unless you use a post type to back the content itself. The VIP commands that generate that output are all opensource

shanemarsh28 commented 2 years ago

Tomjn this is brilliant, I had been trying to find documentation about AllOptions.

After reading the VIP docs, I believe we have had at least one instance where AllOptions has exceeded 1Mb. The main site concerned is very large. I will run some tests.

I am intrigued where it says: "Letting your site’s option reach that size will have negative performance implications and can lead to the site being unavailable until the problem is fixed." On one occasion the site in question refused to load and the confusing presentation of the issue was the options table appeared to go corrupt, specifically "html_type". Every page on the front end went into a document download and when we tried to get into WP Admin we just had raw HTML presented in the browser.

Question: Is there a mechanism for the following case:

  1. AllOptions is saved to Memcached but curtailed due to exceeding the 1MB limit
  2. Wordpress retrieves the "broken" key data (at a later point in time)
  3. Wordpress determines the options table is damaged and attempts to re-save the keys as empty arrays

This synaro would lead to the data loss and in all cases we needed recover the options table from a backup. Is it possible?

Shane

** PS Thank you for your help. This is interesting a very important we understand whats happening.

shanemarsh28 commented 2 years ago

Tomjn, Another question. What would happen if the Memcached key limit was raised from 1MB to 10MB (-I 10m)? I just noticed we did this.

We have quite a few sites where AllOptions will easily exceed 1MB. The site in question (that crashed), all options is about 4.9MB.

tomjn commented 2 years ago

AllOptions is saved to Memcached but curtailed due to exceeding the 1MB limit

¯\_(ツ)_/¯

Wordpress retrieves the "broken" key data (at a later point in time)

I'm not sure it would be broken, my expectation is that it would not be in memcached, you wold see this in the error logs. Broken is a very loaded and ambiguous word, this is a great way to confuse everybody.

Wordpress determines the options table is damaged and attempts to re-save the keys as empty arrays

I don't believe WP determines if a table is damaged? And if it did where would it get the value from? And what would this have to do with memcached? This appears highly speculative.

Tomjn, Another question. What would happen if the Memcached key limit was raised from 1MB to 10MB (-I 10m)? I just noticed we did this.

Try setting it to 100kb locally and test

We have quite a few sites where AllOptions will easily exceed 1MB. The site in question (that crashed), all options is about 4.9MB.

That's rather high even ignoring memcached, reduce it.

shanemarsh28 commented 2 years ago

We've just had another site where the options table has crashed and memcached (object-cache.php) is not in use. This issue can't be related so I will close the issue.