getpatchwork / patchwork

Patchwork is a web-based patch tracking system designed to facilitate the contribution and management of contributions to an open-source project.
http://jk.ozlabs.org/projects/patchwork/
GNU General Public License v2.0
273 stars 82 forks source link

Some queries from Web interface aren't using indexes requiring too much time on v3.1.3 #580

Open mchehab opened 7 months ago

mchehab commented 7 months ago

This query: state=*&archive=true Is producing a complex sql statement that it is not using indexes and are taking a long time to complete:

explain SELECT (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=1), 0)) AS `tag_1_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=2), 0)) AS `tag_2_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=3), 0)) AS `tag_3_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=4), 0)) AS `tag_4_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=5), 0)) AS `tag_5_count`, `patchwork_patch`.`id`, `patchwork_patch`.`msgid`, `patchwork_patch`.`date`, `patchwork_patch`.`submitter_id`, `patchwork_patch`.`project_id`, `patchwork_patch`.`name`, `patchwork_patch`.`delegate_id`, `patchwork_patch`.`state_id`, `patchwork_patch`.`series_id`, `patchwork_person`.`id`, `patchwork_person`.`email`, `patchwork_person`.`name`, `patchwork_person`.`user_id`, `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`, `patchwork_state`.`id`, `patchwork_state`.`name`, `patchwork_state`.`slug`, `patchwork_state`.`ordering`, `patchwork_state`.`action_required`, `patchwork_series`.`id`, `patchwork_series`.`name` FROM `patchwork_patch` INNER JOIN `patchwork_person` ON (`patchwork_patch`.`submitter_id` = `patchwork_person`.`id`) LEFT OUTER JOIN `auth_user` ON (`patchwork_patch`.`delegate_id` = `auth_user`.`id`) LEFT OUTER JOIN `patchwork_state` ON (`patchwork_patch`.`state_id` = `patchwork_state`.`id`) LEFT OUTER JOIN `patchwork_series` ON (`patchwork_patch`.`series_id` = `patchwork_series`.`id`) WHERE (`patchwork_patch`.`project_id` = 1 AND `patchwork_patch`.`archived`) ORDER BY `patchwork_patch`.`date` DESC LIMIT 1000;
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+
| id   | select_type        | table              | type   | possible_keys                                                        | key                      | key_len | ref                                    | rows  | Extra                       |
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+
|    1 | PRIMARY            | patchwork_patch    | ref    | patchwork_patch_499df97c,patchwork_patch_1a37f020,patch_covering_idx | patchwork_patch_499df97c | 4       | const                                  | 30607 | Using where; Using filesort |
|    1 | PRIMARY            | patchwork_person   | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.submitter_id | 1     |                             |
|    1 | PRIMARY            | auth_user          | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.delegate_id  | 1     | Using where                 |
|    1 | PRIMARY            | patchwork_state    | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.state_id     | 1     | Using where                 |
|    1 | PRIMARY            | patchwork_series   | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.series_id    | 1     | Using where                 |
|    6 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    5 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    4 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    3 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    2 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+

As reported by mysql slow log, it takes more than 1:30 mins to complete:

# Thread_id: 5213  Schema: patchwork  QC_hit: No
# Query_time: 92.553408  Lock_time: 0.000355  Rows_sent: 1000  Rows_examined: 101124
# Rows_affected: 0  Bytes_sent: 367857

Issue noticed at https://patchwork.linuxtv.org.

mchehab commented 7 months ago

Analyze results for such query:

analyze.json

stephenfin commented 7 months ago

This is likely above my paygrade :sweat_smile: I can reproduce on the linuxtv instance but have yet to do so locally (and even then, I'm not certain how I'd address it yet...)

mchehab commented 7 months ago

This is likely above my paygrade 😅 I can reproduce on the linuxtv instance but have yet to do so locally (and even then, I'm not certain how I'd address it yet...)

Yeah, this one seems tricky, and it generates 3 sub-queries that don't use indexes. Btw, if the query keeps having the blank fields there, like ?q=&series=&.., the number of sub-queries without indexes rise to 4.

Funny enough, a query like ?archive=false is fast. I suspect that the problem here is how patchwork/Django converts state=*. into a query.

I mean, running:

analyze select state_id, count(*) from patchwork_patch where NOT archived GROUP BY state_id;
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
| id   | select_type | table           | type | possible_keys      | key                | key_len | ref   | rows  | r_rows   | filtered | r_filtered | Extra                    |
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
|    1 | SIMPLE      | patchwork_patch | ref  | patch_covering_idx | patch_covering_idx | 1       | const | 30608 | 44236.00 |   100.00 |     100.00 | Using where; Using index |
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
1 row in set (0.047 sec)

is really fast. So, IMO, the problem is how patchwork currently handle "state=*", creating this really complex query, while something a lot simpler would produce the same result while using indexes.

mchehab commented 7 months ago

Looking at the code, it sounds it comes from here:

class PatchQuerySet(models.query.QuerySet):
    def with_tag_counts(self, project=None):
        if project and not project.use_tags:
            return self

        # We need the project's use_tags field loaded for Project.tags().
        # Using prefetch_related means we'll share the one instance of
        # Project, and share the project.tags cache between all patch.project
        # references.
        qs = self.prefetch_related('project')
        select = OrderedDict()
        select_params = []

        # All projects have the same tags, so we're good to go here
        if project:
            tags = project.tags
        else:
            tags = Tag.objects.all()

        for tag in tags:
            select[tag.attr_name] = (
                "coalesce("
                "(SELECT count FROM patchwork_patchtag"
                " WHERE patchwork_patchtag.patch_id=patchwork_patch.id"
                " AND patchwork_patchtag.tag_id=%s), 0)"
            )
            select_params.append(tag.id)

        return qs.extra(select=select, select_params=select_params)

No idea why it is trying to count patches there via such complex query, instead of just doing the query.