Open WesleyAC opened 1 year ago
SQL queries involved in making a request to /user/b
(user ID 2) as user "a" (user ID 1):
So from that, the most impactful queries seem to be:
(cost=1418.89..1418.90) SELECT COUNT(*) AS "__count" FROM "bookwyrm_status" INNER JOIN "bookwyrm_user" ON ("bookwyrm_status"."user_id" = "bookwyrm_user"."id") LEFT OUTER JOIN "bookwyrm_comment" ON ("bookwyrm_status"."id" = "bookwyrm_comment"."status_ptr_id") LEFT OUTER JOIN "bookwyrm_quotation" ON ("bookwyrm_status"."id" = "bookwyrm_quotation"."status_ptr_id") LEFT OUTER JOIN "bookwyrm_review" ON ("bookwyrm_status"."id" = "bookwyrm_review"."status_ptr_id") WHERE (NOT ((EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userblocks" U2 WHERE (U2."user_subject_id" = 1 AND U2."user_object_id" = "bookwyrm_status"."user_id") LIMIT 1) OR EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userblocks" U2 WHERE (U2."user_object_id" = 1 AND U2."user_subject_id" = "bookwyrm_status"."user_id") LIMIT 1))) AND "bookwyrm_status"."privacy" IN ('public', 'unlisted', 'followers', 'direct') AND NOT (NOT ((EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userfollows" U2 WHERE (U2."user_subject_id" = 1 AND U2."user_object_id" = "bookwyrm_status"."user_id") LIMIT 1) OR "bookwyrm_status"."user_id" = 1 OR EXISTS(SELECT (1) AS "a" FROM "bookwyrm_status_mention_users" U1 WHERE (U1."user_id" = 1 AND U1."status_id" = "bookwyrm_status"."id") LIMIT 1))) AND "bookwyrm_status"."privacy" = 'followers') AND NOT (NOT (("bookwyrm_status"."user_id" = 1 OR EXISTS(SELECT (1) AS "a" FROM "bookwyrm_status_mention_users" U1 WHERE (U1."user_id" = 1 AND U1."status_id" = "bookwyrm_status"."id") LIMIT 1))) AND "bookwyrm_status"."privacy" = 'direct') AND NOT "bookwyrm_status"."deleted" AND "bookwyrm_user"."is_active" AND "bookwyrm_status"."user_id" = 2 AND NOT ("bookwyrm_comment"."status_ptr_id" IS NULL AND "bookwyrm_status"."privacy" = 'direct' AND "bookwyrm_quotation"."status_ptr_id" IS NULL AND "bookwyrm_review"."status_ptr_id" IS NULL));
(cost=511.56..511.57) SELECT COUNT(*) AS "__count" FROM "bookwyrm_notification" WHERE ("bookwyrm_notification"."user_id" = 1 AND NOT "bookwyrm_notification"."read");
(cost=0.71..450.64) SELECT "bookwyrm_user"."id", "bookwyrm_user"."password", "bookwyrm_user"."last_login", "bookwyrm_user"."is_superuser", "bookwyrm_user"."first_name", "bookwyrm_user"."last_name", "bookwyrm_user"."is_staff", "bookwyrm_user"."is_active", "bookwyrm_user"."date_joined", "bookwyrm_user"."username", "bookwyrm_user"."email", "bookwyrm_user"."key_pair_id", "bookwyrm_user"."inbox", "bookwyrm_user"."shared_inbox", "bookwyrm_user"."federated_server_id", "bookwyrm_user"."outbox", "bookwyrm_user"."summary", "bookwyrm_user"."local", "bookwyrm_user"."bookwyrm_user", "bookwyrm_user"."localname", "bookwyrm_user"."name", "bookwyrm_user"."avatar", "bookwyrm_user"."preview_image", "bookwyrm_user"."followers_url", "bookwyrm_user"."default_post_privacy", "bookwyrm_user"."remote_id", "bookwyrm_user"."created_date", "bookwyrm_user"."updated_date", "bookwyrm_user"."last_active_date", "bookwyrm_user"."manually_approves_followers", "bookwyrm_user"."theme_id", "bookwyrm_user"."hide_follows", "bookwyrm_user"."show_goal", "bookwyrm_user"."show_suggested_users", "bookwyrm_user"."discoverable", "bookwyrm_user"."show_guided_tour", "bookwyrm_user"."feed_status_types", "bookwyrm_user"."summary_keys", "bookwyrm_user"."preferred_timezone", "bookwyrm_user"."preferred_language", "bookwyrm_user"."deactivation_reason", "bookwyrm_user"."deactivation_date", "bookwyrm_user"."allow_reactivation", "bookwyrm_user"."confirmation_code", "bookwyrm_user"."two_factor_auth", "bookwyrm_user"."otp_secret", "bookwyrm_user"."hotp_secret", "bookwyrm_user"."hotp_count" FROM "bookwyrm_user" INNER JOIN "bookwyrm_userfollows" ON ("bookwyrm_user"."id" = "bookwyrm_userfollows"."user_object_id") WHERE "bookwyrm_userfollows"."user_subject_id" = 1;
(cost=101.31..101.32) SELECT COUNT(*) FROM (SELECT DISTINCT "bookwyrm_shelf"."id" AS Col1, "bookwyrm_shelf"."created_date" AS Col2, "bookwyrm_shelf"."updated_date" AS Col3, "bookwyrm_shelf"."remote_id" AS Col4, "bookwyrm_shelf"."name" AS Col5, "bookwyrm_shelf"."identifier" AS Col6, "bookwyrm_shelf"."description" AS Col7, "bookwyrm_shelf"."user_id" AS Col8, "bookwyrm_shelf"."editable" AS Col9, "bookwyrm_shelf"."privacy" AS Col10 FROM "bookwyrm_shelf" INNER JOIN "bookwyrm_shelfbook" ON ("bookwyrm_shelf"."id" = "bookwyrm_shelfbook"."shelf_id") WHERE (NOT ((EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userblocks" U2 WHERE (U2."user_subject_id" = 1 AND U2."user_object_id" = "bookwyrm_shelf"."user_id") LIMIT 1) OR EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userblocks" U2 WHERE (U2."user_object_id" = 1 AND U2."user_subject_id" = "bookwyrm_shelf"."user_id") LIMIT 1))) AND "bookwyrm_shelf"."privacy" IN ('public', 'followers') AND NOT (NOT ((EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userfollows" U2 WHERE (U2."user_subject_id" = 1 AND U2."user_object_id" = "bookwyrm_shelf"."user_id") LIMIT 1) OR "bookwyrm_shelf"."user_id" = 1)) AND "bookwyrm_shelf"."privacy" = 'followers') AND "bookwyrm_shelfbook"."book_id" IS NOT NULL AND "bookwyrm_shelf"."user_id" = 2)) subquery;
(cost=100.84..101.29) SELECT DISTINCT "bookwyrm_shelf"."id", "bookwyrm_shelf"."created_date", "bookwyrm_shelf"."updated_date", "bookwyrm_shelf"."remote_id", "bookwyrm_shelf"."name", "bookwyrm_shelf"."identifier", "bookwyrm_shelf"."description", "bookwyrm_shelf"."user_id", "bookwyrm_shelf"."editable", "bookwyrm_shelf"."privacy" FROM "bookwyrm_shelf" INNER JOIN "bookwyrm_shelfbook" ON ("bookwyrm_shelf"."id" = "bookwyrm_shelfbook"."shelf_id") WHERE (NOT ((EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userblocks" U2 WHERE (U2."user_subject_id" = 1 AND U2."user_object_id" = "bookwyrm_shelf"."user_id") LIMIT 1) OR EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userblocks" U2 WHERE (U2."user_object_id" = 1 AND U2."user_subject_id" = "bookwyrm_shelf"."user_id") LIMIT 1))) AND "bookwyrm_shelf"."privacy" IN ('public', 'followers') AND NOT (NOT ((EXISTS(SELECT (1) AS "a" FROM "bookwyrm_userfollows" U2 WHERE (U2."user_subject_id" = 1 AND U2."user_object_id" = "bookwyrm_shelf"."user_id") LIMIT 1) OR "bookwyrm_shelf"."user_id" = 1)) AND "bookwyrm_shelf"."privacy" = 'followers') AND "bookwyrm_shelfbook"."book_id" IS NOT NULL AND "bookwyrm_shelf"."user_id" = 2) LIMIT 3;
However, there are a couple things to keep in mind about this:
Seems like in practice all of these queries are quite quick, taking a handful of milliseconds at most (via EXPLAIN ANALYZE
for the case of me looking at mouse's profile), so my guess is that there's some N+1 query thing going on here. I'll add some telemetry and see if that shows anything useful, and I might play with the queries on some more realistic data.
My suspicion is that the queries for books for each shelf are the slow thing here, I think that may not have been properly captured in my test.
I looked at a more realistic set of queries and didn't see anything egregious, a few hundred ms of queries in total. Going to wait for data from #2795 to look into this more.
Ok, a fascinating new development here. I think what causes the super high delays for this page is a thundering herd effect, where a bunch of servers (for reasons I don't understand) all request a specific user at exactly the same time. Here's what it looks like in honeycomb:
Importantly, for each of those peaks, every single request is for the same user, from a different server.
I don't really get what's going on here, but I think either figuring out how to prevent/spread out this thundering herd effect (maybe not possible) or cache user pages would give significant improvements here.
This sounds like something that the existing nginx caching should be catching? All requests from logged out requesters are supposed to be cached
One potential reason that might not be happening is that the requests all come in at the same time, and take ~5s to complete, so by the time the page has been rendered the first time and cached, most of the requests have already been passed through? Not sure what the expected behaviour of the nginx cache is, but that wouldn't surprise me, the timings work out.
Here's a look at one of these instances, the first/fastest request took ~7s to complete, and I've highlighted ~7s on the graph:
(edit: more filtered chart)
It remains very strange to me that these requests manage to be so coordinated, though.
It also checks out that the burst would be that length, since there may be requests after that which nginx is caching, so we don't see in honeycomb.
My guess would be that we're sending out a broadcast at the same time to a bunch of instances, and they're all responding by loading the user.
That would make sense. It surprises me that the effect is so immediate, but it does make sense when I think about it. It'll be easier to tell if that's what's going on once we get the data from #2805 back.
If that is the case, then we probably want to have the broadcast_task
fill the cache for the user before sending any requests. I don't know the best way to do that — probably it's easiest if we use the Django caching framework, but it's possible we could tell nginx to prefill the cache for a given page as well.
Or, a simpler version would be to change the broadcast task to realize if it's sending it to a large number of recipients, and if so, to send it to a small handful of them first, wait a fixed amount of time, and then allow the rest through. I suspect that might be the best solution in terms of being simple and effective.
Looking at Honeycomb data for HTTP latency, the pages that stand out the most are
/inbox
(expected due to #2717, although we should improve that) and/user/
. Specifically:We should figure out what queries are responsible for this and improve them if possible.