WordPress / performance

Performance plugin from the WordPress Performance Group, which is a collection of standalone performance modules.
https://wordpress.org/plugins/performance-lab/
GNU General Public License v2.0
340 stars 91 forks source link

Module Proposal: Optimized Autoloaded Options #526

Open felixarntz opened 1 year ago

felixarntz commented 1 year ago

Overview

This module proposal was created during WordCamp US 2022 as a collaboration between @merkys7 @markjaquith @felixarntz @sabrina-zeidan @rytisder.

About the module

WordPress has basically forever had a mechanism to load many (likely most) of its options in one single database request, by relying on a single flag autoload which could be set to yes or no. This is being done to improve performance, since having individual database queries for every option used on a page would be a lot slower.

However, the decision whether to autoload an option is currently left to plugins and themes (when they call add_option() or update_option()), without making it a requirement: There is a default value of yes, and many (likely most) plugin and theme developers are not aware of what that value does or that it even exists.

This often results in an extremely bloated list of options to autoload, with many of these options unused, especially over time as plugins are tried out and then deleted, replaced with other plugins etc. As of today there is no mechanism in core to clean up any of this.

Purpose

The primary goal of the proposed module is to keep the list of options to autoload at a sizable level, to make the relevant database request faster and to avoid the scenario where the list of those options and values is too large to fit into an object cache solution.

Developers of plugins and themes should no longer be responsible for this as this is a very low level problem that WordPress core can solve by itself better than offloading it to each individual extension. Additionally, if this was based on an API, it would be unlikely or at least take years (if not decades) that plugins and themes would use such an API correctly at scale.

Scope

The module would do the following to achieve the above goal:

By using the autoload column in the database, the implementation would not require any additional storage than WordPress core is already using today, the only exception being a single new option to store the timestamp of when the “autoload cleanup routine” was last started (which is needed to control when it should restart again from scratch).

In addition, since the module would alter the autoload column values in the database, it should come with a backup mechanism that could restore the original autoload column values later if needed.

Something that this proposal does not explicitly include, but could also be thought about further after establishing a first version of the module, would be to also disallow options that are very frequently written to from being autoloaded.

This module would not do the following things:

If this module is approved and can eventually land in core, it would likely make #347 and the current autoloaded options Site Health check unnecessary, fixing the problem in a central place.

Rationale

A bloated list of autoloaded options becomes a severe performance problem for many WordPress sites over time, unless their databases are regularly inspected either manually or using e.g. a cleanup plugin, thus slowing down the alloptions query or even breaking the caching of it entirely, making that slow query impacting performance even more negatively.

Other

We did an additional brief test, to see how many autoloaded options were actually used with vanilla WordPress core.

Those are of course very basic situations, but it shows that already for even core with a default theme the module already improves the situation. Likely the performance win is minimal for such a small list, but real world WordPress sites have a much larger list of autoloaded options.

Proof of concept

@markjaquith wrote a proof of concept plugin (very rough, emits UI in the theme footer) just to see if this was feasible:

https://gist.github.com/markjaquith/f21566334133d2843598c064e76f1142

felixarntz commented 1 year ago

@bethanylang I've added this module proposal under the Object Cache focus for now, since it's the most applicable one, however with some of the efforts there are it might make sense to have another focus around e.g. Database, where this would fit into better. We don't have any process for this, but maybe we can rethink how we deal with focus areas, also given that 2 of them currently have no lead.

felixarntz commented 1 year ago

On a broader note, I think one thing that would be crucial to do around this module proposal is some analysis to see how much impact the autoloaded options database query in fact has on server-side performance. While we're all pretty sure this is a significant problem from various levels of experience, we should validate that.

markjaquith commented 1 year ago

@felixarntz yeah, we can check the impact (maybe as % of front page load time taken by this query) at various byte sizes. 250k, 500k, 1m, 2m, say.

Would be interesting to simulate db connection latency somehow. Maybe there is some way to do an ssh tunnel that is bandwidth constrained.

Absolute numbers will vary a lot, I expect.

pbearne commented 1 year ago

We have 2 proposals for Options both are doing a lot the same and could run together https://github.com/WordPress/performance/issues/347 I am going to try to summarise the differences

Both will need https://core.trac.wordpress.org/ticket/56045 and https://core.trac.wordpress.org/ticket/37930 in core.

526 works at the site level, so it will deliver not need options and miss large options 347 works at the page level, so it has all the options and just the options needed for a given page, including large options

526 adds one options value/row 347 adds an option value/row per page but tries to use a persistent object cache if available

526 still uses the string 'yes' as the row value, so SQL is slow (we might be able to change this to 0/1 to improve the SQL speed) 347 stores the IDs of the rows needed, so the SQL is faster

526 can run a normal plugin 347 needs to hook in via object-cache-php as we bypass the get_alloptions() function

It might be that 526 will go into the core, and 347 will become part of caching plugins But we should look at both together and work out which is the best compromise as I can see 526 winning on small sites and 347 winning on large messy sites (or sites with a persistent object cache)

bethanylang commented 1 year ago

Thanks @pbearne and @felixarntz! I think it's worth adding an initial discussion to next week's chat agenda to discuss both this ticket and #347 in tandem and determine next steps moving forward. Let me know if that makes sense.

@felixarntz Agreed re: focus areas. I'll pick this up with you elsewhere!

markjaquith commented 1 year ago

One downside of this proposal is that when you do the weekly calibration run, any options that aren't used on that load will be ejected from autoload, and then moved back in the first time they're accessed.

But we could mitigate this by storing an option containing the options keys that were dynamically loaded, and an option containing the options keys that were autoloaded because they were previous dynamically loaded, but were also used.

That sounds confusing... but consider this flow.

  1. Week 1 calibration uses A B C. Autoloads: A B C
  2. Week 1 also sees D and E get used and dynamically added to autoloads. Autoloads: A B C D E
  3. Week 2 calibration uses A B C. It adds in D and E. Autoloads: A B C D E
  4. Week 2 doesn't access D at all, but does access E.
  5. Week 3 calibration uses A B C. It adds in E, but ejects D. Autoloads: A B C E
pbearne commented 1 year ago

Will this have to be loaded via object-cache to catch the early get options calls?

markjaquith commented 1 year ago

@pbearne Not necessarily. For sites using the default transient PHP memory object cache, we could just look inside the one-by-one object cache to see what got put in there before we were able to start listening.

Didn't validate that in my proof-of-concept, but I think it's feasible.

We can also add a filter and use that to mark certain WP core options (that are accessed early) as "should definitely always be autoloaded".

pbearne commented 1 year ago

We can also add a filter and use that to mark certain WP core options (that are accessed early) as "should definitely always be autoloaded".

I suspect that is the way to go

eugene-manuilov commented 1 year ago
  • Track which options are actually used on the site and based on that regularly rebuild the list of options to autoload, by dynamically modifying the autoload column in the database.

This feels like a band-aid that is intended to solve symptoms of a problem by adding an additional level of functionality that is needed only because we don't have API in the codebase that will allow developers to properly register their options. We have API that allows developers register settings, meta keys for posts and terms, nav menus, sidebars and widgets, post types and taxonomies, but we don't have one for options yet.

If we allow developers to register their options with a new register_option( $option_name, $args ) function, this will solve the problem eventually. Yes, this will not solve the problem immediately, it will take some time for developers to update their plugins and themes, but we will have a light at the end of the tunnel.

The question will be how to incentivise developers to use the new API. It would be great if we can add a check to the plugin/theme checker that will indicate that options are not registered if they don't use the new register function, and show this information (warnings and errors found in plugins and themes) on plugin/theme pages at wp.org.

An additional benefit of using the new register_option function is that we can also use it to provide information whether an option should be cacheable. As we all know, sometimes options are used to permanently store some data that is not options in its nature (responses from 3rd party API, pre-rendered pieces of the frontend, etc). Such data usually is not supposed to be stored in the object cache, but it is because the current options API don't differentiate option data. The ability to define whether an option should be cacheable will help to solve this problem as well.

markjaquith commented 1 year ago

How would we store the registered options and how would we know when they weren't used anymore?

eugene-manuilov commented 1 year ago

How would we store the registered options...

Registered options will be stored in memory, the same way how other registered entities are stored (an array in the global scope).

... and how would we know when they weren't used anymore?

Once a plugin is deactivated, it stops registering plugin options, hence those options stop appearing in the registered options array and no longer used in preloading.

Here is a very basic sample:

<?php
/*
 * Plugin Name: my fancy plugin
 * ...
 */

// "init_options" is not existing action and is used here just for demo purposes,
// more appropriate action is yet to be defined if we go with this approach.
add_action( 'init_options', function() {
     register_option( 'my_fancy_optionA', array(
        'autoload'              => true,
        'cacheable'             => true,
        'sanitization_callback' => function( $value ) { ... }, // the sanitization_callback is used whenever we add or update the option.
    ) );

    register_option( 'my_fancy_optionB', array(
        'autoload'              => false,
        'cacheable'             => true,
        'sanitization_callback' => function( $value ) { ... },
    ) );

    ...
} );

...
tillkruss commented 1 year ago

Once a plugin is deactivated, it stops registering plugin options, hence those options stop appearing in the registered options array and no longer used in preloading.

It would also allow WordPress to detect how many options (and bytes) a single plugin is registering.

markjaquith commented 1 year ago

Registered options will be stored in memory, the same way how other registered entities are stored (an array in the global scope).

Hmm. But WordPress queries the autoloaded options before plugins are included. We need to know the contents of active_plugins to know which plugins are even active! So wouldn't this result at the very least in two stages of options loading?

felixarntz commented 1 year ago

@eugene-manuilov Note that there is already register_setting(), so I would argue we could cover any additional arguments needed by enhancing that function rather than introducing a new one.

My main concern if we go only with such an API is that it will still not be used enough. No matter how much we advocate for it being used, as long as not using it does not result in fatal errors (which we also cannot do because of backward compatibility), realistically it won't be used widely enough to help all sites. So I think an automated approach without requiring external developer intervention is critical.

I am not at all opposed to providing a cleaner API to register the autoload value alongside an option, as long as it's complementary to the proposal from this issue. In fact, I think having the autoload value as a registered value in register_setting() would be helpful as it would allow WordPress at any point to look up the intended usage for whether to autoload.

Potentially this could work as follows:

Of course, the latter case may in theory still result in the same problem we have today. But I would argue due to the fact that there would no longer be a default value, the problem would be much smaller. At least developers would explicitly have to specify yes, by which we could assume at least a minimal sense of awareness for what that value does - different from today, where yes is just the default and a developer may literally not even know that this parameter exists, let alone what it does.

eugene-manuilov commented 1 year ago

Hmm. But WordPress queries the autoloaded options before plugins are included. We need to know the contents of active_plugins to know which plugins are even active! So wouldn't this result at the very least in two stages of options loading?

Yes, unfortunately, we will need to execute an additional query to load active plugins (and maybe other critical information for WP core, such as home/site url, etc) before loading plugin/theme level options.

Note that there is already register_setting(), so I would argue we could cover any additional arguments needed by enhancing that function rather than introducing a new one.

@felixarntz, yes, you are right, but the register_settings() function signature expects a setting to be associated with settings group, which is not needed if a plugin/theme uses its own settings page with a custom UI. Perhaps we can make the other way around: instead of using the register_settings function with an empty settings group name, we can make the register_settings function use the register_option one internally to register setting/option arguments? 🤔

My main concern if we go only with such an API is that it will still not be used enough. No matter how much we advocate for it being used, as long as not using it does not result in fatal errors (...), realistically it won't be used widely enough to help all sites. So I think an automated approach without requiring external developer intervention is critical.

I think as soon as we start showing plugin/theme checker results on the plugin/theme pages at wp.org (and in the search results in the admin), plugin and theme authors will immediately start considering implementing best practices because they will start loosing audience that will think twice before installing a plugin/theme that has technical concerns.

felixarntz commented 1 year ago

@eugene-manuilov

Note that there is already register_setting(), so I would argue we could cover any additional arguments needed by enhancing that function rather than introducing a new one.

@felixarntz, yes, you are right, but the register_settings() function signature expects a setting to be associated with settings group, which is not needed if a plugin/theme uses its own settings page with a custom UI. Perhaps we can make the other way around: instead of using the register_settings function with an empty settings group name, we can make the register_settings function use the register_option one internally to register setting/option arguments? 🤔

I agree that group name parameter there is unfortunate, but it is simply "legacy bloat" we will have to keep for backward compatibility. I'm not sure that justifies introducing an entirely new function though. I would say what we could do is overload the second parameter so that it becomes the $args instead, and soft-deprecate the 3rd parameter. Then the new encouraged way would be to only provide the option name and $args to register_setting(), and the group name could be come part of $args in that situation (so it only needs to be specified if actually relevant). I think that would clean up this function quite a bit. For BC, we would just have to keep supporting the current/old way, but that's very little overhead to have.

My main concern if we go only with such an API is that it will still not be used enough. No matter how much we advocate for it being used, as long as not using it does not result in fatal errors (...), realistically it won't be used widely enough to help all sites. So I think an automated approach without requiring external developer intervention is critical.

I think as soon as we start showing plugin/theme checker results on the plugin/theme pages at wp.org (and in the search results in the admin), plugin and theme authors will immediately start considering implementing best practices because they will start loosing audience that will think twice before installing a plugin/theme that has technical concerns.

I think we are a long ways off of that, if it ever happens :) It is unclear whether wordpress.org would ever show stats on plugins/themes, as there is quite some controversy around it, so we can't rely on that. And even if we did, I would still argue that, with the scale of having 60k+ plugins and themes, there is no way they would all properly use such APIs. IMO we should see APIs as optional ways to fine tune core behavior and optimize it, but it should not be the only way to get good performance. Core should do whatever it can to help the majority of sites, while having APIs so that developers can manually improve that behavior further for a specific site's use-case.

Mte90 commented 1 year ago

Just to give some other feedback, I created this years ago https://github.com/CodeAtCode/WPDB-Status/.
It is just a php dropin to the wp root that automatically report how much kb and how many options are in autoload. I usually use it for new customer (with websites managed badly) to do an audit and a DB cleanup later.

Part of those autoload often are also transient that aren't cleaned or orphaned, plugin misuses for some stuff as to track licenses or responses from rest api.

I see the various evaluation about how to do that, maybe another can be track by specific hooks (like init) if the get_option is executed and change to autoload. I don't think that is the case to do it for wp_loaded as a lot of plugin execute all together so can create fake positives.
Also register_setting doesn't have a parameter to specify the plugin and there are frameworks like CMB2 to manage that part so maybe it won't detect right the plugin itself.

I am wondering if it is better instead of something inside WP that automatically change the status for autoloaded options a WP cli command, in this way it is managed by people that knows what is happening (also if we are talking about options that can be read in the next request).

tillkruss commented 1 year ago

@felixarntz @markjaquith I've had a conversation with @jdevalk over this at CloudFest. His suggestion was to create different auto-loading types. Ideas of yes|no, we could have: frontend, api, admin, cron, everywhere.

There could have some kind of (opt-in) auto-optimize process where all yes options are set to let's say admin and of an option has to be fetched for front-end requests it's set to all. This could slim down the alloptions key a lot.

jdevalk commented 1 year ago

Happy to help on any of this!

merkys7 commented 1 year ago

I really adore @tillkruss idea

pbearne commented 1 year ago

If you look at the (rough) code I created using a cache key from the requested URL, we could create the buckets to end up with, say six keys.

I think the best option is to polish up the code with some example proposals and try them out and measure how they proform.

tillkruss commented 1 year ago

@pbearne Can you link your code?

pbearne commented 1 year ago

https://github.com/pbearne/wordpress-develop/tree/conditional_options_preload And this issue https://github.com/WordPress/performance/issues/347

spacedmonkey commented 1 year ago

I had a play around with @markjaquith POC code. See.

My thoughts. I really like the idea of setting unused options to not autoload. But is hard to know when to do this. In my tests, I did it on the homepage while logged in. But that autoloaded option could be used in somewhere like the CMS / REST API/RSS feed/sitemap, that would mean a performance regression there. For that reason, I think this optimization should be done upon user request and not automatically done as part of a WordPress upgrade or a schedule event. I could see the following scenario.

User can could run this "Optimize options" many times, as the site ages or more plugins are installed, then the options may get out of hand.

How is how I see this functionality working for users. I could also see a WP CLI command being added, that could be run by developers / hosting companies. Do we think this is a path forward?

Mte90 commented 1 year ago

I think that should be the right way, in this way we are doing a bit of awareness about the topic and probably many plugin developers will improve their code because they will get bug reports for that.

pbearne commented 8 months ago

Worked on my version of the code some more this week Here is a summary of what I am trying to do https://gist.github.com/pbearne/ce21417bf5ecfa10b84afc758092cd9f

This class implements various filters and actions to optimize WordPress's caching mechanism based on whether persistent object caching plugins like Redis or Memcached are present or not. It does so by using wpcache* functions provided by WordPress core. The main functionality can be divided into three parts:

  1. Preloading alloptions: In multisite setups, WordPress stores all site-specific options in one database table called wp_site. This means that when working with multiple sites, retrieving all available options will involve a single query which could potentially slow down page loads significantly especially on larger installations with many active sites. To avoid such delays, we can preload all site-specific options once during the initial request and store them in memory for future requests. This is achieved by registering a filter hook pre_get_alloptions, which intercepts the original call to retrieve all options (get_alloptions) and retrieves the cached values if they're already available, otherwise triggers the actual DB query.
  2. Conditional preloading of individual options: Since some options may not change frequently and don't need to be reloaded every time, we can further improve performance by conditionally loading only those options that have been modified recently or whose names match specific patterns. We achieve this by adding another filter hook preoption{$option}, which checks whether the requested option has already been used within the current request or not. If not, then it adds the name of the requested option to an internal list self::$alloptions_used. When the shutdown event occurs, we check whether there are any new options added to the list since last check and save them back to the cache under a unique key derived from the URL path of the current request.
  3. Statistical analysis: Finally, we also register a shutdown action hook to log statistical information about how many options were actually required during the request and compare it against the total number of options stored in the database (both site-wide and per-site). This helps us identify areas where unnecessary queries might be causing performance bottlenecks and take corrective measures accordingly.

Overall, the code appears to be a useful tool for optimizing the performance of a WordPress site, and can help to reduce the number of database queries that are made when a page is loaded.

joemcgill commented 3 weeks ago

Couple of updates here:

@jdevalk has published this plugin that implements many of the ideas that have been discussed here and in other related proposals related to optimizing autoloaded options. I wonder if there are opportunities introduce some of the optimizations he's included in that plugin to WP Core?

Now that we have updated the autoloaded options API via https://core.trac.wordpress.org/changeset/57920 to allow for dynamic optimization of options using the default $autoload parameter, we could use this to experiment with additional optimizations beyond disabling autoloading for options based on their size.