Yoast / wordpress-seo

Yoast SEO for WordPress
https://yoast.com/wordpress/plugins/seo/
Other
1.78k stars 894 forks source link

When clearing permalinks, save the indexables through a query #15381

Closed johannadevos closed 4 years ago

johannadevos commented 4 years ago

Please give us a description of what happened.

In the #yoast-siteground Slack channel, the following message was shared:

I’ve cross-referenced the huge traffic peaks with the slow queries we managed to log within the DB processlist and here is the pattern:

Query   | 26    | Waiting for table metadata lock | DELETE FROM `wp_yoast_indexable_hierarchy` WHERE `indexable_id` = '11'                               |
Query   | 24    | Waiting for table metadata lock | DELETE FROM `wp_yoast_indexable_hierarchy` WHERE `indexable_id` = '331'                              |
Query   | 24    | Waiting for table metadata lock | DELETE FROM `wp_4_yoast_indexable_hierarchy` WHERE `indexable_id` = '156'                            |
Query   | 24    | Waiting for table metadata lock | DELETE FROM `wp_yoast_indexable_hierarchy` WHERE `indexable_id` = '30'                               |
Query   | 23    | Waiting for table metadata lock | DELETE FROM `wp_yoast_indexable_hierarchy` WHERE `indexable_id` = '228'                              |
Query   | 22    | Waiting for table metadata lock | DELETE FROM `wp_yoast_indexable_hierarchy` WHERE `indexable_id` = '132'                              |
Query   | 21    | Waiting for table metadata lock | UPDATE `wp_yoast_indexable` SET `permalink` = NULL, `updated_at` = '2020-06-05 11:40:38' WHERE `id`  |

There are more than 100 DELETE queries from wp_yoast_indexable_hierarchy at a time:

grep 20200605T114059 /var/log/perp/mysqlproclist/current |grep DELETE  |wc -l
108

Cause and solution Currently, in clearing the permalinks there is a loop which saves each indexable separately. This should happen in a query (and the indexables should not be retrieved). Saving them can be done as follows: repository->query->set->where->save.

Please describe what you expected to happen and why.

How can we reproduce this behavior?

1. 2. 3.

Technical info

Used versions

igorschoester commented 4 years ago

Looks like this is the query in the indexable-permalink-watcher:

        Wrapper::get_wpdb()->update(
            Model::get_table_name( 'Indexable' ),
            [
                'permalink' => null,
            ],
            $where
        );

After the permalink is set to null we build them per indexable again in the ensure_permalink in the indexable-repository.

Tasks

Djennez commented 4 years ago

About the DELETE queries: while I can't be 100% certain why this happens on our own website, I was able to reproduce this locally.

Edit / addition: This isn't limited to REST API requests, but the request of ancestors for objects in general. The same delete_all() triggers for just visiting http://acceptance.test/author/user-11/ as well. However, the logs provided by our host show timestamps that differ just a few milliseconds, which leads me to believe this is one process, which could be the REST API.


From what I can gather, this is due to the plugin hooking into the REST API to add yoast_head to the results. Every object in the REST response looks to be individually built, including our data. When breadcrumbs are enabled, yoast_head will contain a JSON of the breadcrumbs of the object, and this is where the DELETE requests come from. The code responsible is in /wordpress-seo/lib/orm.php in the delete_many() function. This function seems to be called individually for every object with breadcrumbs that is generated for the REST response. This is the stack trace for one object:

orm.php:2281, Yoast\WP\Lib\ORM->delete_many()
indexable-hierarchy-repository.php:46, Yoast\WP\SEO\Repositories\Indexable_Hierarchy_Repository->clear_ancestors()
indexable-hierarchy-builder.php:99, Yoast\WP\SEO\Builders\Indexable_Hierarchy_Builder->build()
indexable-hierarchy-repository.php:90, Yoast\WP\SEO\Repositories\Indexable_Hierarchy_Repository->find_ancestors()
indexable-repository.php:361, Yoast\WP\SEO\Repositories\Indexable_Repository->get_ancestors()
breadcrumbs-generator.php:116, Yoast\WP\SEO\Generators\Breadcrumbs_Generator->generate()
indexable-presentation.php:611, Yoast\WP\SEO\Presentations\Indexable_Author_Archive_Presentation->generate_breadcrumbs()
abstract-presentation.php:69, Yoast\WP\SEO\Presentations\Indexable_Author_Archive_Presentation->__get()
breadcrumb.php:46, Yoast\WP\SEO\Generators\Schema\Breadcrumb->generate()
schema-generator.php:88, Yoast\WP\SEO\Generators\Schema_Generator->generate()
indexable-presentation.php:600, Yoast\WP\SEO\Presentations\Indexable_Author_Archive_Presentation->generate_schema()
abstract-presentation.php:69, Yoast\WP\SEO\Presentations\Indexable_Author_Archive_Presentation->__get()
schema-presenter.php:58, Yoast\WP\SEO\Presenters\Schema_Presenter->get()
schema-presenter.php:42, Yoast\WP\SEO\Presenters\Schema_Presenter->present()
meta.php:144, Yoast\WP\SEO\Surfaces\Values\Meta->get_head()
indexable-head-action.php:108, Yoast\WP\SEO\Actions\Indexables\Indexable_Head_Action->for_author()
yoast-head-rest-field.php:139, Yoast\WP\SEO\Routes\Yoast_Head_REST_Field->for_author()
class-wp-rest-controller.php:458, WP_REST_Users_Controller->add_additional_fields_to_object()
class-wp-rest-users-controller.php:1051, WP_REST_Users_Controller->prepare_item_for_response()
class-wp-rest-users-controller.php:324, WP_REST_Users_Controller->get_items()
class-wp-rest-server.php:1015, WP_REST_Server->dispatch()
class-wp-rest-server.php:342, WP_REST_Server->serve_request()
rest-api.php:306, rest_api_loaded()
class-wp-hook.php:287, WP_Hook->apply_filters()
class-wp-hook.php:311, WP_Hook->do_action()
plugin.php:544, do_action_ref_array()
class-wp.php:388, WP->parse_request()
class-wp.php:735, WP->main()
functions.php:1274, wp()
wp-blog-header.php:16, require()
index.php:17, {main}()

It looks like, if an object does not have ancestors (even if they're not objects that can not have ancestors) get_ancestors() in /wordpress-seo/src/repositories/indexable-repository.php is trying to regenerate the ancestors. It looks like there is no difference between "This object has no ancestors in the table" and "This object does not have ancestors generated", so it always tries to generate them? During generation, the ancestors get cleaned which causes the DELETE queries. And even though delete_many() is used, it is done per object iteration, so only one object at a time.

To reproduce:

andizer commented 4 years ago

My findings so far:

andizer commented 4 years ago

Possible Solution: We have to add a field named has_ancestors. When the ancestors are calculated we set that field to true/false.

In the build routine for the ancestors we check for that field, if set to true we delete the indexable hierarchy.