WordPress / performance

Performance plugin from the WordPress Performance Group, which is a collection of standalone performance modules.
https://wordpress.org/plugins/performance-lab/
GNU General Public License v2.0
340 stars 91 forks source link

Revisit indexes for DB performance #132

Open patrick-leb opened 2 years ago

patrick-leb commented 2 years ago

As discussed with @tillkruss :

I was having some slow queries – specifically one query that was already described 6 years ago here: https://core.trac.wordpress.org/ticket/33885

The last comment by OllieJones https://core.trac.wordpress.org/ticket/33885#comment:86 pointed at hints on how it could be indexed better and he created a plugin that updates the indexes accordingly:

https://wordpress.org/plugins/index-wp-mysql-for-speed/

TL;DR of what the plugin does here: https://www.plumislandmedia.net/index-wp-mysql-for-speed/tables_and_keys/

Applying those indexes has, in some queries, saved me 80%+ in DB time. This seems more apparent on larger sites with a ton of meta data (e.g. woocommerce)

He mentioned to me he plans on doing a PR to core eventually.

I'd like to tag him here in but i don't have his github handle.

tillkruss commented 2 years ago

@bethanylang @felixarntz This has quite the impact and may be low hanging fruit. We don't have a database performance team do we? Maybe in core?

What are managed hosts doing? Can we discuss this next meetup?

felixarntz commented 2 years ago

@tillkruss I agree this looks like a promising performance enhancement when we can find an appropriate solution. If we wanted to explore that in the performance plugin, we could consider a module that optimizes meta queries.

I'm not 100% sure we can actually do that in a plugin (i.e. using the core filters that exist), particularly around an enhanced caching mechanism. Regarding the indexes, we could potentially add a routine that updates the database table accordingly - that we should definitely test on a massive site though before giving it out in the plugin.

Do you think it's worth focusing on the caching aspect of it as well? In that case, I'd say this is suitable for your focus area :)

lkraav commented 2 years ago

@OllieJones deserves special recognition for his work on https://github.com/OllieJones/index-wp-mysql-for-speed

Performance results are unreal.

On a site with 10M wp_postmeta rows, 6M wp_posts rows, WooCommerce and the works.

wp index-mysql enable wp_postmeta

or

# Time: 220127  7:38:39                                                                                                                                                                         
# User@Host: cxl[cxl] @ localhost []                                                                                                                                          
# Thread_id: 917  Schema: cxl  QC_hit: No                                                                                                                                         
# Query_time: 288.264698  Lock_time: 0.003060  Rows_sent: 0  Rows_examined: 0                                                                                                                   
# Rows_affected: 0  Bytes_sent: 50                                                                                                                                                              
SET timestamp=1643287119;                                                                                                                                                                       
ALTER TABLE wp_postmeta ADD UNIQUE KEY meta_id (meta_id), DROP PRIMARY KEY, ADD PRIMARY KEY (post_id, meta_key, meta_id), DROP KEY meta_key, ADD KEY meta_key (meta_key, meta_value(32), post_id
, meta_id), ADD KEY meta_value (meta_value(32), meta_id), DROP KEY post_id, DROP KEY post_id_2 /**imfs-query-tag*611573002*/;

Resulted in queries for WC orders and subscriptions go from tens of seconds to near-instant.

BEFORE

image

AFTER

image

How can we get this to hands of all WP users as fast as possible?

patrick-leb commented 2 years ago

Thank you for tagging @OllieJones!! Somehow GitHub didn’t let me find him (?!?)

the performance gain is absolutely sick indeed. I think his indexes need to be pushed to Core but in the meantime I’ll recommend his plugin to anyone I know.

tillkruss commented 2 years ago

@lkraav, @OllieJones would it make sense to have this as part of the new Performance lab plugin for wider testing, or would it be better to make a PR to core directly?

lkraav commented 2 years ago

Would it make sense to have this as part of the new Performance lab plugin for wider testing, or would it be better to make a PR to core directly?

I'm not a DB wizard myself, so I feel like bringing in a Core DB wizard to evaluate would be the move to make.

SergeyBiryukov commented 2 years ago

Catching up here, the Index WP MySQL For Speed plugin looks quite impressive, and the theory behind it is also a recommended reading.

would it make sense to have this as part of the new Performance lab plugin for wider testing, or would it be better to make a PR to core directly?

Indeed, I believe having this as part of the Performance Lab plugin would make it easier to test, while making a PR to core could bring more attention to it, and more code reviews. So both options have their benefits :) Maybe start with the plugin, and proceed with the PR once we're satisified with the amount of testing?

In the meantime, I have moved the related ticket about meta_form() to the 6.0 milestone, in an attempt to revisit the earlier patch that improves performance for those using persistent object caching.

OllieJones commented 2 years ago

I firmly believe this indexing stuff should be in core, or at any rate in a widely deployed plugin.

A complication: older versions of InnoDB (in MySQL / MariaDB) still use the so-called Antelope storage format. That has a limit of 767 bytes (not characters) in any given index. That makes the meta_key columns for utf8mb4 characters need a prefix index. meta_key(191) is the way that was handled at the advent of utf8mb4,

Barracuda's (the newer storage format's) limit is 3072 bytes, so we can kiss the prefix indexes goodbye. That is very good. It makes us able to exploit the clustered index layout of InnoDB (in which the data of the table is stored in the primary key).

So, Rick James's and my plugin must detect Antelope on the site owner's RDBMS (which is ridiculously hard, the MySQL and MariaDB are racing each other to erase all traces of it in newer versions) and choose indexes approrpriately / prefix or not. Could we just tell Antelope users "hey, upgrade, or we can't help you"? I'd like to. But Godaddy deploys old, EOL, MySQL servers. (And I have a client who uses GoDaddy.) It probably makes sense, at some point, to warn site owners about old MySQL the way you warn them about old php.

I know the people at MariaDB are willing to review whatever this team decides to do.

And, I have a gripe. The 5.8.3 - to - 5.9 update process makes a half-hearted attempt to restore the WordPress standard keys. It does so in a way that a plugin cannot hook. Writeup here. https://wordpress.stackexchange.com/questions/401827/version-upgrade-can-my-plugin-filter-the-sql-statements-issued-during-a-databas I've implemented a nasty workaround.

SergeyBiryukov commented 2 years ago

Barracuda's (the newer storage format's) limit is 3072 bytes, so we can kiss the prefix indexes goodbye. That is very good. It makes us able to exploit the clustered index layout of InnoDB (in which the data of the table is stored in the primary key).

Just to clarify, could this be made as a progressive enhancement, i.e. improving the indexes for those systems that support it, or would it mean raising the minimum MySQL version requirement? Per the Requirements page, while WordPress currently recommends MySQL 5.7+ or MariaDB 10.2+, it also still supports MySQL 5.0+ in legacy environments.

And, I have a gripe. The 5.8.3 - to - 5.9 update process makes a half-hearted attempt to restore the WordPress standard keys. It does so in a way that a plugin cannot hook. ... There are filters called dbdelta_queries (source), dbdelta_insert_queries (source), and dbdelta_create_queries (source). But it seems my plugin is not in scope while those filters run, so I can't hook them. The version upgrade process hits upgrade.php directly (source).

The upgrade.php file does load the full WordPress environment (source), which includes plugins, so at a glance it seems like a plugin should be able to use those filters. I can run some tests to see if that's the case. If not, I'd be open to make this easier, either by adding more convenient filters, or by making sure a plugin has access to the existing ones.

patrick-leb commented 2 years ago

Just to clarify, could this be made as a progressive enhancement, i.e. improving the indexes for those systems that support it, or would it mean raising the minimum MySQL version requirement? Per the Requirements page, while WordPress currently recommends MySQL 5.7+ or MariaDB 10.2+, it also still supports MySQL 5.0+ in legacy environments.

IMHO, the fact that there are old 5.0 environments out there should not be a blocker to create those indexes on environments that DO support it. The improvements are so massive, whoever is sensible enough to performance are on MySQL 5.7+ MariaDB 10.2+ already.

I absolutely support a progressive enhancement like that.

lkraav commented 2 years ago

Not sure if this is on @konamiman and @barryhughes radar yet, but probably should be.

Noticed from https://github.com/woocommerce/woocommerce/pull/31819

OllieJones commented 2 years ago

Barracuda's (the newer storage format's) limit is 3072 bytes, so we can kiss the prefix indexes goodbye. That is very good. It makes us able to exploit the clustered index layout of InnoDB (in which the data of the table is stored in the primary key).

Just to clarify, could this be made as a progressive enhancement?

Yes, it could, and probably should be. (LImited) experience with the so-called "high-performance" indexes on MySQL 5.5, MySQL 5.6, and MariaDB 10.1 shows a fairly modest speedup because we can't redefine primary keys / clustered indexes to include any VARCHAR(250) columns. The big win in those legacy environments comes from an ordinary (non-UNIQUE, non-PK) compound prefix index on (meta_key(32), meta_value(32)) on the wp_whatevermeta tables. (32 rather than 191, because the 32 character prefix disambiguates pretty much everything that needs disambiguating, and shorter indexes generate less IO and take less RAM.)

And, experience shows that testing with lots of old versions is a pain in the xxx neck. But the Automattic team has access to better testing tools than a solo dev, so that may not be a factor.

Maybe a progressive improvement should simply ignore the older RDBMS versions, and only apply it to newer versions.

while WordPress currently recommends MySQL 5.7+ or MariaDB 10.2+, it also still supports MySQL 5.0+ in legacy environments.

The question is, how widespread are the legacy setups? I know GoDaddy is stuck on 5.6. Does your update-detection telemetry give you the RDBMS version, to help support a good decision?

OllieJones commented 2 years ago

Here's a puzzle: WordPress Core's schema has a mess of different VARCHAR(250) key columns, like wp_options.option_name and wp-whatevermeta.meta_key. All these key columns use the utfmb4 character set. Older sites were upgraded when 4.2 rolled out.

And those key columns were upgraded to utf8mb4 along with all the LONGTEXT content columns. That necessitated the (191) prefix indexing

Here's the question: couldn't those key columns use an eight-bit character set like latin1? Could they be reverted to the older utf8(mb3) character set? Either of those would let us kiss the prefix indexes goodbye, even on the legacy Antelope versions of the DBMS.

Then the keys and values would use different character sets. utf8mb4 for content is obviously in line with WordPress's "serve everybody" principle. Does a requirement to "use ASCII or latin1 for key names wherever possible" violate that principle? My opinion: it does not. Does "no emojis in key names" violate that principle? Certainly not.

Why might changing the character set of the keys be impossible?

What might be the advantages?

Obviousman observation: too bad the 4.2 switchover didn't leave the key columns unchanged.

Possible opportunity: a future database upgrade could, if it were possible, change those keys's character sets to latin1 or, if that is not possible, to utf8mb3.

Otto42 commented 2 years ago

Applying those indexes has, in some queries, saved me 80%+ in DB time.

And how much larger in space requirements did that take up in your DB?

Database indexes aren't magic beans. They take up disk space. Lots and lots of disk space. So, yes, you're making a tradeoff with storage vs. speed. And it's not always a good trade.

Indexing the meta_value is a bad idea, because you should not be searching for anything based on the meta_value in the first place.

The whole point of meta data is that it is key based. You get the post, then you get the meta associated with it. You absolutely do not get the the post based on the meta. You absolutely do not search the meta. This is designed this way.

Now, did some queries and designers forget that? Possibly. But it was intended to store data by key, and by ID, NOT by value. Its very fast one way, and very slow the wrong way. If you feel that you need additional indexing, then what you really need, is to fix the actual problem by storing the data in the correct way in the first place. The meta system does not fit every goal, and trying to optimize it further is a mistake that should instead be corrected by building a different system to store the data in a different way in the first place.

lkraav commented 2 years ago

Applying those indexes has, in some queries, saved me 80%+ in DB time.

And how much larger in space requirements did that take up in your DB?

I looked at some of my larger tables

wp_stream_meta - 27.7 GiB = Data 10.0 GiB + Index 17.7 GiB wp_stream - 8.5 GiB = Data 3.6 GiB + Index 4.9 GiB wp_posts - 4.0 GiB = Data 2.6 GiB + Index 1.4 GiB wp_postmeta - 2.9 GiB = Data 1.1 GiB + Index 1.8 GiB wp_comments - 1.3 GiB = Data 0.43 GiB + Index 0.85 GiB wp_commentmeta - 0.91 GiB = Data 0.45 GiB + Index 0.46 GiB

So disk space requirement of +100% to +150% range.

Database indexes aren't magic beans. They take up disk space. Lots and lots of disk space. So, yes, you're making a tradeoff with storage vs. speed. And it's not always a good trade. ... Now, did some queries and designers forget that? Possibly. But it was intended to store data by key, and by ID, NOT by value. ...

I'd argue price of storage space being the least of the worries where these upgraded indexes start to matter. For the majority of small sites, disk space overhead remains insignificant. Any big site will likely already have adequate disk space pre-planned, or has the resources to discover any storage ceilings and upgrade.

It's not entirely clear whether @Otto42 argues for or against index upgrade presented here. All above statements about meta system design are probably true. Regardless, IMHO this index upgrade, for the meta system we currently have, delivers performance value many times multiple of any related disk space costs.

javiercasares commented 2 years ago

Also, index disk space is not the same as backup / dump space, so maybe the database increases, but should not affect other parts.

maximejobin commented 2 years ago

Should indexes be revisited? Of course.

Should this solution be applied everywhere? Hell no.

In my opinion, we (in general) tend to want a solution to work flawlessly out-of-the-box. The thing is these manipulations (redoing all indexes) have a cost (space) and, in maybe 90% of the cases, they do not produce an output that is noticeable.

If you have millions of rows and your website is slow, I would not recommend to blindly run a script that will "optimize your database". The article posted by @OllieJones is really detailed and explains clearly every suggested edit. These edits should be analyzed and applied when they make sense... by someone that knows what he/she does.

I would suggest to lightly apply the changes to the core and let the context drive what should be done manually.

Database indexes aren't magic beans. They take up disk space. Lots and lots of disk space. So, yes, you're making a tradeoff with storage vs. speed. And it's not always a good trade.

I agree with @Otto42 , here.

OllieJones commented 2 years ago

Doing database optimization work for WordPress is a challenge. There are many installations out there, operated by people with a breathtakingly wide range of skill levels, running code developed by people with a wide range of skill levels, running on a wide variety of database server versions, the majority of which are obsolete or soon will be obsolete. We must address the WordPress world as it is, not as we wish it were. That's hard.

@Otto42 wrote _Indexing the meta_value is a bad idea, because you should not be searching for anything based on the metavalue in the first place. That's a fine rule if you get to review every query before it goes into production. But, that's not WordPress's reality. Not even close. Don't believe me? Turn on Query Monitor and then show the Posts panel in the dashboard. Look for the query that retrieves the list of users who could potentially be authors, to populate a quick-edit pulldown. Hint to help you find it: it's gnarly and contains several meta_value LIKE '%constant%' clauses.

And look at the way custom posts (WooCommerce products for example) handle their attributes (Tee shirt size for example). Meta values.

Of course indexes take HDD / SSD space. Of course some people, especially people who have been running small sites for a long time, don't have the space or the IOPS to use the space.

patrick-leb commented 2 years ago

I'm with @OllieJones and @lkraav here.

Disk space is basically a non-issue. It's extremely cheap and DB providers are giving generous disk allotment for VPS'es nowadays. It's a bit of whataboutism. To Ollie's point, disk space is probably a bigger factor for very, very old sites, which probably aren't running MySQL8.0 anyway nor even 5.7.

This doesn't mean for new installations the indexes shouldn't be created. Maybe a wp-config flag could be used to create the indexes on existing installations. This would mean someone made their due diligence and they'll benefit greatly from enabling it.

tillkruss commented 2 years ago

@patrick-leb: This sounds like a perfect module for this plugin, making it opt-in.

azaozz commented 2 years ago

Catching up here too, the Index WP MySQL For Speed plugin seems to add some pretty impressive enhancements. Thinking that at least some of them should find their way in core.

Tend to agree with @Otto42 and @SergeyBiryukov though, not sure it makes sense to index the option and meta values. The theory behind the plugin mentions that these fields are LONGTEXT, i.e. can reach 4GB each but doesn't seem to go into more details. Quite often the data in them is serialized or otherwise encoded, making them not suitable for searches. I agree that in some particular cases an index on the meta_value may be desirable, and can increase performance. But seems in most cases the trade-off between the size of these indexes and the (rare cases of) performance increase may not be worth it. Perhaps further "research" there about the more common use cases would be helpful.

Also thinking that the progressive enhancement idea would work well when implementing these indexes. Can potentially add them only for more recent DB versions thus minimizing the possibility of something going wrong.

OllieJones commented 2 years ago

@azaozz I would agree with you about the utility of indexing the LONGTEXT columns, but for one thing: There are quite a few queries (many in various plugins) that use WHERE meta_value = something or WHERE meta_value LIKE something%. The prefix indexes on those columns can rule out rows that don't match. That makes a difference. The subject plugin indexes meta_value(32) rather than meta_value(191) to save a little tablespace and IO. I'd advocate for leaving the meta_value indexes in place: benefits outweigh costs.

You're right that searching php-serialized data in meta_value columns is very unlikely to be helped by indexes. Here's an analysis, part of another optimization plugin.

azaozz commented 2 years ago

I'd advocate for leaving the meta_value indexes in place: benefits outweigh costs.

Sure. The next step would probably be to try to arrange some "live" testing on a big and busy site to see how that would affect the tables and the overall performance. As far as I see the WHERE meta_value ... queries in core are in wp_admin so they don't run often. But perhaps some are added by plugins and run on the front-end too.

I've also been thinking about adding another field to the meta and options tables that would hold the data type, see: https://core.trac.wordpress.org/ticket/55942. It will be most useful for determining when to unserialize the meta_value, but would also improve the WHERE meta_value .... queries as it would become possible to limit them by data type. BTW comments/ideas about that trac ticket are welcome :)

OllieJones commented 2 years ago

For what it's worth our Index WP MySQL For Speed plugin has a feature for monitoring database traffic and highlighting the slowest queries. A year's worth of experience with it on various big production sites has shown significant performance benefits.

Without a persistent object cache, many sites' database traffic is often dominated by hits to wp_options. The indexes on that table in our plugin were chosen to slightly favor sites with an object cache.

As for changing the table definition? Above my pay grade.

AlecKinnear commented 2 years ago

Thank you Ollie for this outstanding work. It's cut query time down for us significantly.

As a long-time plugin author and provider of WordPress VIP hosting solutions, I'm disappointed to see that core WordPress continues to be crippled by attitudes like this:

If you have millions of rows and your website is slow, I would not recommend to blindly run a script that will "optimize your database". The article posted by @OllieJones is really detailed and explains clearly every suggested edit. These edits should be analyzed and applied when they make sense... by someone that knows what he/she does.

Maxime goes on to say: "these manipulations (redoing all indexes) have a cost (space) and, in maybe 90% of the cases, they do not produce an output that is noticeable".

Disk space is cheap. Slow queries and hung queries are costly.

What's most costly of all is hiring someone to do analysis and edits of the database by hand for every WordPress site in existence. In our experience, good indexing make a huge difference. Instead of spending +200,000 man hours on Gutenberg, surely we could endeavour to apply a few dozen hours of expertise to core WordPress to really improve performance.

From half-outside (after having our 30x comment count performance improvements thrown away for no very good reason ten years ago – oh, surprise, surprise, just found out our fix was accepted ten years later), this kind of spokes-in-the-wheel attitude is highly discouraging and exactly why we donate so few hours to work on Core. It seems to be one giant competition to sabotage each other's performance improvements.

Can we not work together to significantly improve performance? I shudder to think, based on our experience, on the number of important performance improvements which have been blithely discarded over the last ten years.

Specifically in this case, Ollie's routine should be optimised to run by default with good default values. Some documentation could be published on WordPress.org on ways to hand tweak Ollie's routine for those sites which could benefit from custom versions.

Performance is a core issue and not an aside. Good performance should not be restricted to large publishers who pay either us or WordPress VIP thousands every month for hand-tweaked hosting.

patrick-leb commented 2 years ago

What's most costly of all is hiring someone to do analysis and edits of the database by hand for every WordPress site in existence. In our experience, good indexing make a huge difference

The biggest problem I have with this attitude of "Duh, get a pro DBA with WP experience" is that in reality, 99.99% of WP site owners will NOT have access to a pro DBA the caliber of @OllieJones - and i'm likely missing a couple of 9's - yet his plugin that he generously donated to the community will make their site faster. Not only that, but this could also have a substantial, positive impact on the carbon footprint of WP worldwide.

The disk space argument seems like a straw-man, to be honest.

pbearne commented 2 years ago

@OllieJones I would like your input on this idea https://github.com/WordPress/performance/issues/347 Do think this would work/help?

peterwilsoncc commented 2 years ago

One of the issues that's been raised for meta queries in WP is that they do not generate terribly efficient JOINs, see WP#49278. Would improving the way these are built gain much in terms of performance. There's an early pull request at https://github.com/WordPress/wordpress-develop/pull/1229

Improving the queries seems like a good first step. Subsequent testing may prove that an index is still needed but it would be preferable to avoid altering the indexes on live sites if possible.

OllieJones commented 2 years ago

With respect, your statement "it would be preferable to avoid altering the indexes on live sites if possible" is contrary to good practice for managing busy databases. It's a routine operation to adjust the indexes on various tables as they grow, and as the query shapes settle in for the particular site's production traffic.

There's a clear separation of responsibilities in the dbms world. Programmers update code, and database administrators wrangle indexes, tablespaces and similar operational stuff. To the extent that database administrators have any code of ethics at all (don't get me started!) they strive to configure the databases so existing code runs faster, and only rarely demand code changes to solve performance problems.

Changing code to make it better is good. But it's harder to write and test, and more risky, than changing indexes.

As for your code changes, it looks like your major change to the shape of the code is...

FROM wp_posts
INNER JOIN wp_postmeta     ON wp_posts.ID = wp_postmeta.post_id
INNER JOIN wp_postmeta mt1 ON wp_posts.ID = mt1.post_id
WHERE wp_postmeta.meta_key = 'this' AND wp_postmeta.meta_key = 'that'

to this.

FROM wp_posts
INNER JOIN wp_postmeta     ON wp_posts.ID = wp_postmeta.post_id AND wp_postmeta.meta_value = 'this'
INNER JOIN wp_postmeta mt1 ON wp_posts.ID = mt1.post_id         AND mt2.meta_value = 'that'

Those query shapes give the same results, and they both benefit greatly from the modernized keys I have proposed. Recent versions of the DBMSs generate the same query plans for both query shapes.

peterwilsoncc commented 2 years ago

Thanks for the explanation @OllieJones, it's really helpful.

I was genuinely wondering if more efficient queries were possible, I wasn't making a passive aggressive attempt to block the idea. It appears they are not.

OllieJones commented 2 years ago

No passive aggressive intention taken, certainly not. It's my goal to bring a little more SQL performance awareness into the WordPress community. The leverage is huge: better efficiency means power savings in data centers, less CO2 and all that.

Certainly it's possible to make some queries more efficient by refactoring them. It's always a good idea to assess the before-and-after efficiency using MariaDB / MySQL's EXPLAIN feature, on a realistically sized set of tables.

That's why I have a WordPress instance with 100K fake users, and another with 40K fake pages and posts.I'm still looking for a large WPML site. Query efficiency varies a lot between versions of MySQL, and the majority of sites use 5.5 and 5.6, the old ones. Sigh.

AlecKinnear commented 1 year ago

Guys, we almost have consensus here that WordPress should improve the indexes on sites with the MySQL MariaDB versions to support the improvements. Yes, there is terrible hosting out there like GoDaddy and Bluehost and all the rest of the consolidated Endurance International hosts.

By catering to the lowest common denominator, we actually encourage the bad behaviour of these hosts.

By offering progressive enhancement to publishers who do upgrade their infrastructure to more recent and more capable versions (especially true of PHP which has come leaps and bounds in performance in the last four years), we move to a model of carrot rather than stick with WordPress.

Right now the WordPress upgrade position is "Upgrade or you'll go up in smoke and your site will fall apart." Instead it should be "Upgrade MySQL and MariaDB to version 6 or later and enjoy amazing new speed improvements."

This way we aren't locking out older sites and slower moving admins but we aren't crippling/kneecapping WordPress performance. I'm really tired of the Automattic line that WordPress performance is for people who host with WordPress VIP or hire Foliovision/Kinsta/whomever to deeply optimise their sites.

I can't imagine anything more anti-democratic, anti-open source and overtly elitist.

It's not all bad: WordPress.org/Automattic have done a very good job with security updates for older versions. That's an incredible amount of good work maintaining compatibility for publishers who would rather spend their time publishing than exploring new version of WordPress with all the massive failures and emergency updates and bad releases involved in perpetual beta.

azaozz commented 1 year ago

@OllieJones

many sites' database traffic is often dominated by hits to wp_options

That's strange. WP generally pre-caches all options on every load with wp_load_alloptions(). This is done for the exact purpose: to prevent multiple trips to the DB. True that some options can be excluded from this pre-caching by design, but that's quite rare. I'm wondering what's causing this traffic on the sites you tested.

AlecKinnear commented 1 year ago

@OllieJones

I'm still looking for a large WPML site. Query efficiency varies a lot between versions of MySQL

We have a WPML site which doesn't have many sites but does have over 200,000 posts and 3 million comments. We are running PHP 7.4 and MariaDB 10.8.4.

There is an issue that this network remains on WordPress 4.9.21 for stability reasons, but we've updated all our other sites to 5.7 so we could bring this network up to 5.7 for testing as it's already planned.

FYI, there's a lot of traffic on this network which makes load questions more interesting. Web server and DB server are two different large VPS.

OllieJones commented 1 year ago

This issue has generated many responses, just like the tix https://core.trac.wordpress.org/ticket/33885 it refers to.

Here's my summary of the choices.

  1. Keep the status quo as of WordPress 6.1: rely on persistent caching and index optimization plugins (like mine and others). Keep hunting🔎 for opportunities to cache various things. (That hunting will happen anyway, of course.)

  2. Do a schema-layout split in core. After such a split, there would be two versions of the WordPress schema going forward. One will support the legacy DBMS versions (Antelope storage engine), and the other will have better indexes without meta_key(191) prefixes.

  3. Change the character sets of meta_key and option_name columns to latin1 or even back to utf8mb3. (Note, the keys, not the values. No way nohow we abandon site owners using utf8mb4-dependent languages.) And then redo the indexes get to rid of the meta_key(191) prefix stuff.

Have I missed any viable alternative approaches? It would be nice to summarize all this to prepare for crisp decision-making.

patrick-leb commented 1 year ago

option 1 is obviously the least controversial, however, I don't think it should be the answer. it's (very nice) bandaid to a crappy schema to begin with. I think Option 2 is the way to go, deprecating the old schema for a few years before removing it entirely. I agree with @AlecKinnear's point that not moving forward just encourages bad hosting behaviours. That Antelope engine is old AF.... we can move with the times.

bethanylang commented 1 year ago

Please vote for an approach on this comment:

Voting will close on Tuesday, November 29, 2022, at 6pm UTC.

azaozz commented 1 year ago
  1. Do a schema-layout split in core. After such a split, there would be two versions of the WordPress schema...

Seems this suggests to fork core (to certain extend). Who is going to maintain the fork? Good luck!! :)

just like the tix https://core.trac.wordpress.org/ticket/33885 it refers to.

Actually thinking to close 33885 as invalid. WP has changed quite significantly over the years and the meta box it refers to is not used (by default) any more. Any DB modernization/optimizations resulting from it should ideally go in new tickets.

Regarding the three options above, is there one to update the schema and convert the old tables. As far as I see very few (really large) sites are affected and would benefit from this. It's likely they have the resources to do this outside of core.

AlecKinnear commented 1 year ago

@bethanylang Those choices take us off track it seems to me. You've managed to obscure the original fix by muddying with changing the keys (which is a red herring here, I'd agree to keeping the keys in a less resource intensive format, but that's another conversation.

What we are talking about here is indexing database tables for a 30x speed improvement on larger WordPress sites. Pardon my cynicism here but it seems that WordPress core contributors are working hard to make sure that WordPress performs poorly at scale, to keep WordPress VIP, Kinsta and Foliovision (my firm) in caviar and champagne.

Andrew (@azaozz)'s comment typifies this condescending and destructive approach:

As far as I see very few (really large) sites are affected and would benefit from this. It's likely they have the resources to do this outside of core.

Surely we can find a better way to make our money than deliberately crippling open-source software.

The right way to do this is as a progressive enhancement. Users who are using modern databases get better indexing. Users stuck on old versions of MySQL keep the current tables and indexing. Whenever a site meets the requirements for better indexing the upgrade is performed silently (put detailed notes in the release notes, as the only people who will care do read release notes before upgrading). The silent upgrade can check database structure at each 6.x upgrade.

There are so many cases where WordPress has deliberately chosen to cripple performance. No wonder almost no young or idealistic developers want to work with WordPress any more. If we insist on keeping our software broken and underperforming for business reasons, I can't blame them.

I do agree with @azaozz about not splitting the schema layout. Supporting multiple key systems sounds is building a nasty long term compatiblity issue into WordPress. It makes most sense to improve and update the schema layout for everyone. But again the schema business should broken out into its own issue/enhancement even if we discuss it here.

Let's index WordPress for performance going forward. Heaven knows WordPress needs performance improvement. Heck if the database indexing improves in WordPress we'll update all our sites to WordPress 6.1 and even write an article explaining why everyone should do so (the joke here is that usually we write articles telling people not to update WordPress as it's a waste of time and will just cost your business time and money, we've even written a plugin which gives control over updates back to the publisher).

tillkruss commented 1 year ago

@AlecKinnear: Or index by default and allow hosting companies to opt-out.

bethanylang commented 1 year ago

@OllieJones @felixarntz Can I ask you to please take a look at the comments above and help me determine the best next steps here? Thank you both!

OllieJones commented 1 year ago

@azaozz and @AlecKinnear have pointed out the big disadvantage to making dbms-version-dependent changes to the default schema's indexes: more stuff to maintain in core etc. It's truly a disadvantage of that alternative, and a strong argument.

With respect, I disagree with @AlecKinnear 's assertion that the voting set up by @bethanylang obscures the issues. Quite, the opposite: it boils them down to basics.

The big competent hosting providers can and will do whatever they need to do for their customers. But the most important stakeholders for WordPress have always been the entry-level site owners and devs. This kind of enhancement, and indeed all performance work, should serve them well.
rjasdf commented 1 year ago
bethanylang commented 1 year ago

:wave: Hi all! The vote in https://github.com/WordPress/performance/issues/132#issuecomment-1307557009 is now formally closed, with the "winning" option being to do a schema-split layout in core (17 votes).

However, it's clear that there's still discussion to be had here before we proceed with any further work, so we'll leave this issue open for that purpose for now with the intent to let everyone chime in here and revisit this in a future weekly performance chat.

AlecKinnear commented 1 year ago

I'm sorry to see the discussion go to hidden in a Weekly Performance Chat. Hopefully someone will rescue the comments from the performance chat and repost them here. Those of us who are not paid to work on WordPress core don't have much time for the politicking and scheduled meeting for free. Burying all the important conversations in Slack is one way to subtly torpedo the open-source community who try to contribute to WordPress only to see performance improvements held back from core for years and instead built into WordPress.com and WordPress VIP private improvements. Basically it's a method of privatising our open source contributions.

It would be far more open to make sure any discussion of this issue comes back to the issue, rather than arbitrarily closing it and punishing those contributors who don't want to hang around in Automattic Slack meetings.

bethanylang commented 1 year ago

:wave: @AlecKinnear Thanks for flagging this! We definitely don't intend to bury any discussion and want to make sure that things are as transparent and easily accessible as possible. To that end, you can always find summaries of our weekly Slack chats under the performance tag on Make here: https://make.wordpress.org/core/tag/performance/.

The discussion noted in my previous comment was referring to the discussion that took place on this GitHub issue, so you haven't missed anything. :) No further discussion regarding this issue has taken place in chat since late November. If and when additional discussion takes place in a Slack chat, we'll be sure to update with a link to the meeting summary here as well as indicate next steps and how anyone who is interested can participate.

rjasdf commented 1 year ago

several meta_value LIKE '%constant%' clauses.

@OllieJones -- But, since that has a leading wildcard, the prefix index on "meta_value[32]" cannot be used. Do you have an example where the index would be useful?

rjasdf commented 1 year ago

Should this solution be applied everywhere? Hell no.

@maximejobin -- Well, I look at it this way.

The big problem at the moment is the mess of changes in MySQL that make it difficult to provide a "one-size-fits-all" for the changes. And, alas, WP started (long ago) with the artificial limit of 255 for various things. That, together with MySQL's multi-step transition to full UTF-8, lead to the "191" kludges.

maximejobin commented 1 year ago

@rjasdf I do agree with your points.

That's exactly why I state it should not be applied everywhere. The question is simple and easy to understand (how to make the database faster?). The answer is not and that is the problem with this thread. We cannot find an easy way to fix everything. Meta values were not created to filter data. So should we apply indexes in a table that was misused by so many plugins and themes?

rjasdf commented 1 year ago

I would like your input on this idea #347 Do think this would work/help?

@pbearne - #347 only addresses wp_options. In my experience, more users get bogged down in using wp_postmeta. For nearly a decade, I have been helping users WP on StackOverflow and StackExchange with slow WP. I have repeatedly suggested the same INDEX improvements, mostly to wp_postmeta.

(Eventually, @OllieJones got wind of what I was doing and built the WP plugin this discussion is about. Ollie expanded the indexes to the rest of the "meta" tables -- This was straightforward except for the hassles with MySQL's incompatible changes in 5.5/5.6/5.7/8.0. And he is doing the herculean task of dealing with versions/upgrades/etc.)

Our plugin addresses the INDEX-related performance issues in wp_options, but it does not address the abuses --