getpatchwork / patchwork

Patchwork is a web-based patch tracking system designed to facilitate the contribution and management of contributions to an open-source project.
http://jk.ozlabs.org/projects/patchwork/
GNU General Public License v2.0
276 stars 82 forks source link

Some queries from Web interface aren't using indexes requiring too much time on v3.1.3 #580

Open mchehab opened 9 months ago

mchehab commented 9 months ago

This query: state=*&archive=true Is producing a complex sql statement that it is not using indexes and are taking a long time to complete:

explain SELECT (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=1), 0)) AS `tag_1_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=2), 0)) AS `tag_2_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=3), 0)) AS `tag_3_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=4), 0)) AS `tag_4_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=5), 0)) AS `tag_5_count`, `patchwork_patch`.`id`, `patchwork_patch`.`msgid`, `patchwork_patch`.`date`, `patchwork_patch`.`submitter_id`, `patchwork_patch`.`project_id`, `patchwork_patch`.`name`, `patchwork_patch`.`delegate_id`, `patchwork_patch`.`state_id`, `patchwork_patch`.`series_id`, `patchwork_person`.`id`, `patchwork_person`.`email`, `patchwork_person`.`name`, `patchwork_person`.`user_id`, `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`, `patchwork_state`.`id`, `patchwork_state`.`name`, `patchwork_state`.`slug`, `patchwork_state`.`ordering`, `patchwork_state`.`action_required`, `patchwork_series`.`id`, `patchwork_series`.`name` FROM `patchwork_patch` INNER JOIN `patchwork_person` ON (`patchwork_patch`.`submitter_id` = `patchwork_person`.`id`) LEFT OUTER JOIN `auth_user` ON (`patchwork_patch`.`delegate_id` = `auth_user`.`id`) LEFT OUTER JOIN `patchwork_state` ON (`patchwork_patch`.`state_id` = `patchwork_state`.`id`) LEFT OUTER JOIN `patchwork_series` ON (`patchwork_patch`.`series_id` = `patchwork_series`.`id`) WHERE (`patchwork_patch`.`project_id` = 1 AND `patchwork_patch`.`archived`) ORDER BY `patchwork_patch`.`date` DESC LIMIT 1000;
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+
| id   | select_type        | table              | type   | possible_keys                                                        | key                      | key_len | ref                                    | rows  | Extra                       |
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+
|    1 | PRIMARY            | patchwork_patch    | ref    | patchwork_patch_499df97c,patchwork_patch_1a37f020,patch_covering_idx | patchwork_patch_499df97c | 4       | const                                  | 30607 | Using where; Using filesort |
|    1 | PRIMARY            | patchwork_person   | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.submitter_id | 1     |                             |
|    1 | PRIMARY            | auth_user          | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.delegate_id  | 1     | Using where                 |
|    1 | PRIMARY            | patchwork_state    | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.state_id     | 1     | Using where                 |
|    1 | PRIMARY            | patchwork_series   | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.series_id    | 1     | Using where                 |
|    6 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    5 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    4 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    3 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    2 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+

As reported by mysql slow log, it takes more than 1:30 mins to complete:

# Thread_id: 5213  Schema: patchwork  QC_hit: No
# Query_time: 92.553408  Lock_time: 0.000355  Rows_sent: 1000  Rows_examined: 101124
# Rows_affected: 0  Bytes_sent: 367857

Issue noticed at https://patchwork.linuxtv.org.

mchehab commented 9 months ago

Analyze results for such query:

analyze.json

stephenfin commented 9 months ago

This is likely above my paygrade :sweat_smile: I can reproduce on the linuxtv instance but have yet to do so locally (and even then, I'm not certain how I'd address it yet...)

mchehab commented 9 months ago

This is likely above my paygrade 😅 I can reproduce on the linuxtv instance but have yet to do so locally (and even then, I'm not certain how I'd address it yet...)

Yeah, this one seems tricky, and it generates 3 sub-queries that don't use indexes. Btw, if the query keeps having the blank fields there, like ?q=&series=&.., the number of sub-queries without indexes rise to 4.

Funny enough, a query like ?archive=false is fast. I suspect that the problem here is how patchwork/Django converts state=*. into a query.

I mean, running:

analyze select state_id, count(*) from patchwork_patch where NOT archived GROUP BY state_id;
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
| id   | select_type | table           | type | possible_keys      | key                | key_len | ref   | rows  | r_rows   | filtered | r_filtered | Extra                    |
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
|    1 | SIMPLE      | patchwork_patch | ref  | patch_covering_idx | patch_covering_idx | 1       | const | 30608 | 44236.00 |   100.00 |     100.00 | Using where; Using index |
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
1 row in set (0.047 sec)

is really fast. So, IMO, the problem is how patchwork currently handle "state=*", creating this really complex query, while something a lot simpler would produce the same result while using indexes.

mchehab commented 9 months ago

Looking at the code, it sounds it comes from here:

class PatchQuerySet(models.query.QuerySet):
    def with_tag_counts(self, project=None):
        if project and not project.use_tags:
            return self

        # We need the project's use_tags field loaded for Project.tags().
        # Using prefetch_related means we'll share the one instance of
        # Project, and share the project.tags cache between all patch.project
        # references.
        qs = self.prefetch_related('project')
        select = OrderedDict()
        select_params = []

        # All projects have the same tags, so we're good to go here
        if project:
            tags = project.tags
        else:
            tags = Tag.objects.all()

        for tag in tags:
            select[tag.attr_name] = (
                "coalesce("
                "(SELECT count FROM patchwork_patchtag"
                " WHERE patchwork_patchtag.patch_id=patchwork_patch.id"
                " AND patchwork_patchtag.tag_id=%s), 0)"
            )
            select_params.append(tag.id)

        return qs.extra(select=select, select_params=select_params)

No idea why it is trying to count patches there via such complex query, instead of just doing the query.

BtbN commented 2 days ago

I also ran into this while upgrading our FFmpeg instance from 3.0 to 3.2. It got so bad the instance ran out of request workers (100 of them), and the site became unuseable, due to 10-20 seconds of response time for each request.

I managed to reduce the issue somewhat by examining some of the most heavy-hitting queries and adding missing indices (using mariadb 11.5):

ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_date` (`id`, `date`);
ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_project_and_state` (`project_id`, `state_id`);

# not entirely sure this one is needed/used
ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_date_and_project` (`project_id`, `date`, `archived`);

This brought down some query times from 5-15 seconds to 0.xxx seconds. But some heavy hitters remain, even with them using indices.

The core issue seems to be that even a relatively simple query like this takes 20+ seconds:

SELECT
    `patchwork_patch`.`id`,
    `patchwork_patch`.`msgid`,
    `patchwork_patch`.`date`,
    `patchwork_patch`.`submitter_id`,
    `patchwork_patch`.`project_id`,
    `patchwork_patch`.`name`,
    `patchwork_patch`.`delegate_id`,
    `patchwork_patch`.`state_id`,
    `patchwork_patch`.`series_id`
FROM `patchwork_patch`
WHERE `patchwork_patch`.`project_id` = 1
ORDER BY `patchwork_patch`.`date` DESC LIMIT 10 OFFSET 36300;

That's purely because of the large offset. The query uses all the indices it can get. I don't think this can really be optimized. This is largely an issue cause various bots are hitting the deep-links to the last page of patches I guess. So I'm gonna try removing the links to deep pages and see if it improves stability.