Open mchehab opened 9 months ago
Analyze results for such query:
This is likely above my paygrade :sweat_smile: I can reproduce on the linuxtv instance but have yet to do so locally (and even then, I'm not certain how I'd address it yet...)
This is likely above my paygrade 😅 I can reproduce on the linuxtv instance but have yet to do so locally (and even then, I'm not certain how I'd address it yet...)
Yeah, this one seems tricky, and it generates 3 sub-queries that don't use indexes. Btw, if the query keeps having the blank fields there, like ?q=&series=&..
, the number of sub-queries without indexes rise to 4.
Funny enough, a query like ?archive=false
is fast. I suspect that the problem here is how patchwork/Django converts state=*
. into a query.
I mean, running:
analyze select state_id, count(*) from patchwork_patch where NOT archived GROUP BY state_id;
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
| 1 | SIMPLE | patchwork_patch | ref | patch_covering_idx | patch_covering_idx | 1 | const | 30608 | 44236.00 | 100.00 | 100.00 | Using where; Using index |
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
1 row in set (0.047 sec)
is really fast. So, IMO, the problem is how patchwork currently handle "state=*", creating this really complex query, while something a lot simpler would produce the same result while using indexes.
Looking at the code, it sounds it comes from here:
class PatchQuerySet(models.query.QuerySet):
def with_tag_counts(self, project=None):
if project and not project.use_tags:
return self
# We need the project's use_tags field loaded for Project.tags().
# Using prefetch_related means we'll share the one instance of
# Project, and share the project.tags cache between all patch.project
# references.
qs = self.prefetch_related('project')
select = OrderedDict()
select_params = []
# All projects have the same tags, so we're good to go here
if project:
tags = project.tags
else:
tags = Tag.objects.all()
for tag in tags:
select[tag.attr_name] = (
"coalesce("
"(SELECT count FROM patchwork_patchtag"
" WHERE patchwork_patchtag.patch_id=patchwork_patch.id"
" AND patchwork_patchtag.tag_id=%s), 0)"
)
select_params.append(tag.id)
return qs.extra(select=select, select_params=select_params)
No idea why it is trying to count patches there via such complex query, instead of just doing the query.
I also ran into this while upgrading our FFmpeg instance from 3.0 to 3.2. It got so bad the instance ran out of request workers (100 of them), and the site became unuseable, due to 10-20 seconds of response time for each request.
I managed to reduce the issue somewhat by examining some of the most heavy-hitting queries and adding missing indices (using mariadb 11.5):
ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_date` (`id`, `date`);
ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_project_and_state` (`project_id`, `state_id`);
# not entirely sure this one is needed/used
ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_date_and_project` (`project_id`, `date`, `archived`);
This brought down some query times from 5-15 seconds to 0.xxx seconds. But some heavy hitters remain, even with them using indices.
The core issue seems to be that even a relatively simple query like this takes 20+ seconds:
SELECT
`patchwork_patch`.`id`,
`patchwork_patch`.`msgid`,
`patchwork_patch`.`date`,
`patchwork_patch`.`submitter_id`,
`patchwork_patch`.`project_id`,
`patchwork_patch`.`name`,
`patchwork_patch`.`delegate_id`,
`patchwork_patch`.`state_id`,
`patchwork_patch`.`series_id`
FROM `patchwork_patch`
WHERE `patchwork_patch`.`project_id` = 1
ORDER BY `patchwork_patch`.`date` DESC LIMIT 10 OFFSET 36300;
That's purely because of the large offset. The query uses all the indices it can get. I don't think this can really be optimized. This is largely an issue cause various bots are hitting the deep-links to the last page of patches I guess. So I'm gonna try removing the links to deep pages and see if it improves stability.
This query:
state=*&archive=true
Is producing a complex sql statement that it is not using indexes and are taking a long time to complete:As reported by mysql slow log, it takes more than 1:30 mins to complete:
Issue noticed at https://patchwork.linuxtv.org.