ellmetha / django-machina

A Django forum engine for building powerful community driven websites.
https://django-machina.readthedocs.org
BSD 3-Clause "New" or "Revised" License
589 stars 126 forks source link

mark_topic_read incredibly slow on large forum #299

Closed danpaulson closed 1 year ago

danpaulson commented 1 year ago

We are using Machina on a fairly large site, and starting to hit some performance issues with mark_topic_read in the update_user_trackers receiver. Seeing SQL calls from 30 to 70 seconds for fresh users. Here's an example of the query:

EXPLAIN ANALYZE SELECT (1) AS "a" FROM "forum_conversation_topic" INNER JOIN "forum_forum" ON ("forum_conversation_topic"."forum_id" = "forum_forum"."id") LEFT OUTER JOIN "forum_tracking_topicreadtrack" ON ("forum_conversation_topic"."id" = "forum_tracking_topicreadtrack"."topic_id") LEFT OUTER JOIN "forum_tracking_forumreadtrack" ON ("forum_forum"."id" = "forum_tracking_forumreadtrack"."forum_id") WHERE ("forum_conversation_topic"."forum_id" = 10 AND (("forum_tracking_topicreadtrack"."mark_time" < ("forum_conversation_topic"."last_post_on") AND "forum_tracking_topicreadtrack"."user_id" = 139881) OR ("forum_tracking_forumreadtrack"."mark_time" < ("forum_conversation_topic"."last_post_on") AND "forum_tracking_forumreadtrack"."user_id" = 139881 AND "forum_tracking_topicreadtrack"."id" IS NULL)) AND NOT ("forum_conversation_topic"."id" = 20427)) LIMIT 1

And the explain:

Limit (cost=8.24..1887.53 rows=1 width=4) (actual time=47059.932..47059.934 rows=0 loops=1) -> Nested Loop (cost=8.24..486743.18 rows=259 width=4) (actual time=47059.931..47059.933 rows=0 loops=1) Join Filter: (((forum_tracking_topicreadtrack.mark_time < forum_conversation_topic.last_post_on) AND (forum_tracking_topicreadtrack.user_id = 139881)) OR ((forum_tracking_forumreadtrack.mark_time < forum_conversation_topic.last_post_on) AND (forum_tracking_forumreadtrack.user_id = 139881) AND (forum_tracking_topicreadtrack.id IS NULL))) Rows Removed by Join Filter: 275454344 -> Merge Left Join (cost=0.71..32719.72 rows=576479 width=28) (actual time=0.012..476.711 rows=676792 loops=1) Merge Cond: (forum_conversation_topic.id = forum_tracking_topicreadtrack.topic_id) -> Index Scan using forum_conversation_topic_pkey on forum_conversation_topic (cost=0.29..1455.73 rows=15369 width=16) (actual time=0.007..24.008 rows=15369 loops=1) Filter: ((id <> 20427) AND (forum_id = 10)) Rows Removed by Filter: 4604 -> Index Scan using forum_tracking_topicreadtrack_topic_id_9a53bd45 on forum_tracking_topicreadtrack (cost=0.42..23587.86 rows=749171 width=20) (actual time=0.004..316.034 rows=745519 loops=1) -> Materialize (cost=7.53..46.33 rows=35 width=16) (actual time=0.000..0.022 rows=407 loops=676792) -> Nested Loop Left Join (cost=7.53..46.15 rows=35 width=16) (actual time=0.043..0.217 rows=407 loops=1) Join Filter: (forum_forum.id = forum_tracking_forumreadtrack.forum_id) -> Seq Scan on forum_forum (cost=0.00..2.15 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1) Filter: (id = 10) Rows Removed by Filter: 11 -> Bitmap Heap Scan on forum_tracking_forumreadtrack (cost=7.53..38.76 rows=419 width=16) (actual time=0.034..0.134 rows=407 loops=1) Recheck Cond: (forum_id = 10) Heap Blocks: exact=26 -> Bitmap Index Scan on forum_tracking_forumreadtrack_forum_id_bbd3fb47 (cost=0.00..7.42 rows=419 width=0) (actual time=0.026..0.026 rows=441 loops=1) Index Cond: (forum_id = 10) Planning Time: 0.793 ms Execution Time: 47059.976 ms


Any help would be appreciated.

danpaulson commented 1 year ago

I think I've tracked it down to this:

https://github.com/ellmetha/django-machina/blob/main/machina/apps/forum_tracking/handler.py#L137

Seems to be what fires on every page and generates the query in the OP. From what I'm seeing, it's incredibly slow for new users and basically instant for those already using the site.

danpaulson commented 1 year ago

Just a followup - I removed the Q and from line 141 in the file linked above as we did not require Forum read checking, and it brought our DB load from ~80% to 5%. That's a big performance hit query that thankfully we didn't need, but others may need to work around / improve.