Slow SQL queries - Githubissues

Nutomic commented 1 year ago

There are some problems with database lockups which seem to be caused by slow queries. I set log_min_duration_statement=3000 and collecting any slow queries in this issue. These should be optimized, because in case of of db pool size 5, and 5 users triggering a slow query at the same time, all db queries would fail for the next couple of seconds.

7 seconds: SELECT "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."embed_video_url", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."language_id", "post"."featured_community", "post"."featured_local", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."icon", "community"."banner", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "community_person_ban"."id", "community_person_ban"."community_id", "community_person_ban"."person_id", "community_person_ban"."published", "community_person_ban"."expires", "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published", "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local", "community_follower"."id", "community_follower"."community_id", "community_follower"."person_id", "community_follower"."published", "community_follower"."pending", "post_saved"."id", "post_saved"."post_id", "post_saved"."person_id", "post_saved"."published", "post_read"."id", "post_read"."post_id", "post_read"."person_id", "post_read"."published", "person_block"."id", "person_block"."person_id", "person_block"."target_id", "person_block"."published", "post_like"."score", coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") FROM (((((((((((("post" INNER JOIN "person" ON ("post"."creator_id" = "person"."id")) INNER JOIN "community" ON ("post"."community_id" = "community"."id")) LEFT OUTER JOIN "community_person_ban" ON ((("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP)))) INNER JOIN "post_aggregates" ON ("post_aggregates"."post_id" = "post"."id")) LEFT OUTER JOIN "community_follower" ON (("post"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = '33517'))) LEFT OUTER JOIN "post_saved" ON (("post"."id" = "post_saved"."post_id") AND ("post_saved"."person_id" = '33517'))) LEFT OUTER JOIN "post_read" ON (("post"."id" = "post_read"."post_id") AND ("post_read"."person_id" = '33517'))) LEFT OUTER JOIN "person_block" ON (("post"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = '33517'))) LEFT OUTER JOIN "community_block" ON (("community"."id" = "community_block"."community_id") AND ("community_block"."person_id" = '33517'))) LEFT OUTER JOIN "post_like" ON (("post"."id" = "post_like"."post_id") AND ("post_like"."person_id" = '33517'))) LEFT OUTER JOIN "person_post_aggregates" ON (("post"."id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = '33517'))) LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = '11402'))) WHERE (((((((((("community_follower"."person_id" IS NOT NULL) AND ("post"."nsfw" = 'f')) AND ("community"."nsfw" = 'f')) AND ("local_user_language"."language_id" IS NOT NULL)) AND ("community_block"."person_id" IS NULL)) AND ("person_block"."person_id" IS NULL)) AND ("post"."removed" = 'f')) AND ("post"."deleted" = 'f')) AND ("community"."removed" = 'f')) AND ("community"."deleted" = 'f')) ORDER BY "post_aggregates"."featured_local" DESC , hot_rank("post_aggregates"."score", "post_aggregates"."newest_comment_time_necro") DESC , "post_aggregates"."newest_comment_time_necro" DESC LIMIT '40' OFFSET '0'

3.5 seconds: SELECT "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."embed_video_url", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."language_id", "post"."featured_community", "post"."featured_local", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."icon", "community"."banner", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "community_person_ban"."id", "community_person_ban"."community_id", "community_person_ban"."person_id", "community_person_ban"."published", "community_person_ban"."expires", "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published", "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local", "community_follower"."id", "community_follower"."community_id", "community_follower"."person_id", "community_follower"."published", "community_follower"."pending", "post_saved"."id", "post_saved"."post_id", "post_saved"."person_id", "post_saved"."published", "post_read"."id", "post_read"."post_id", "post_read"."person_id", "post_read"."published", "person_block"."id", "person_block"."person_id", "person_block"."target_id", "person_block"."published", "post_like"."score", coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") FROM (((((((((((("post" INNER JOIN "person" ON ("post"."creator_id" = "person"."id")) INNER JOIN "community" ON ("post"."community_id" = "community"."id")) LEFT OUTER JOIN "community_person_ban" ON ((("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP)))) INNER JOIN "post_aggregates" ON ("post_aggregates"."post_id" = "post"."id")) LEFT OUTER JOIN "community_follower" ON (("post"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = '-1'))) LEFT OUTER JOIN "post_saved" ON (("post"."id" = "post_saved"."post_id") AND ("post_saved"."person_id" = '-1'))) LEFT OUTER JOIN "post_read" ON (("post"."id" = "post_read"."post_id") AND ("post_read"."person_id" = '-1'))) LEFT OUTER JOIN "person_block" ON (("post"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = '-1'))) LEFT OUTER JOIN "community_block" ON (("community"."id" = "community_block"."community_id") AND ("community_block"."person_id" = '-1'))) LEFT OUTER JOIN "post_like" ON (("post"."id" = "post_like"."post_id") AND ("post_like"."person_id" = '-1'))) LEFT OUTER JOIN "person_post_aggregates" ON (("post"."id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = '-1'))) LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = '-1'))) WHERE ((((((("post"."community_id" = '16') AND ("post"."nsfw" = 'f')) AND ("community"."nsfw" = 'f')) AND ("post"."removed" = 'f')) AND ("post"."deleted" = 'f')) AND ("community"."removed" = 'f')) AND ("community"."deleted" = 'f')) ORDER BY "post_aggregates"."featured_community" DESC , hot_rank("post_aggregates"."score", "post_aggregates"."newest_comment_time_necro") DESC , "post_aggregates"."newest_comment_time_necro" DESC LIMIT '20' OFFSET '0'

3.6 seconds: SELECT "comment_reply"."id", "comment_reply"."recipient_id", "comment_reply"."comment_id", "comment_reply"."read", "comment_reply"."published", "comment"."id", "comment"."creator_id", "comment"."post_id", "comment"."content", "comment"."removed", "comment"."published", "comment"."updated", "comment"."deleted", "comment"."ap_id", "comment"."local", "comment"."path", "comment"."distinguished", "comment"."language_id", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."embed_video_url", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."language_id", "post"."featured_community", "post"."featured_local", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."icon", "community"."banner", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "person1"."id", "person1"."name", "person1"."display_name", "person1"."avatar", "person1"."banned", "person1"."published", "person1"."updated", "person1"."actor_id","person1"."bio", "person1"."local", "person1"."banner", "person1"."deleted", "person1"."inbox_url", "person1"."shared_inbox_url", "person1"."matrix_user_id", "person1"."admin", "person1"."bot_account", "person1"."ban_expires", "person1"."instance_id", "comment_aggregates"."id", "comment_aggregates"."comment_id", "comment_aggregates"."score", "comment_aggregates"."upvotes", "comment_aggregates"."downvotes", "comment_aggregates"."published", "comment_aggregates"."child_count", "community_person_ban"."id", "community_person_ban"."community_id", "community_person_ban"."person_id", "community_person_ban"."published", "community_person_ban"."expires", "community_follower"."id", "community_follower"."community_id", "community_follower"."person_id", "community_follower"."published", "community_follower"."pending", "comment_saved"."id", "comment_saved"."comment_id", "comment_saved"."person_id", "comment_saved"."published", "person_block"."id", "person_block"."person_id", "person_block"."target_id", "person_block"."published", "comment_like"."score" FROM ((((((((((("comment_reply" INNER JOIN "comment" ON ("comment_reply"."comment_id" = "comment"."id")) INNER JOIN "person" ON ("comment"."creator_id" = "person"."id")) INNER JOIN "post" ON ("comment"."post_id" = "post"."id")) INNER JOIN "community" ON ("post"."community_id" = "community"."id")) INNER JOIN "person" AS "person1" ON ("comment_reply"."recipient_id" = "person1"."id")) INNER JOIN "comment_aggregates" ON ("comment"."id" = "comment_aggregates"."comment_id")) LEFT OUTER JOIN "community_person_ban" ON ((("community"."id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "comment"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP)))) LEFT OUTER JOIN "community_follower" ON (("post"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = '8218'))) LEFT OUTER JOIN "comment_saved" ON (("comment"."id" = "comment_saved"."comment_id") AND ("comment_saved"."person_id" = '8218'))) LEFT OUTER JOIN "person_block" ON (("comment"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = '8218'))) LEFT OUTER JOIN "comment_like" ON (("comment"."id" = "comment_like"."comment_id") AND ("comment_like"."person_id" = '8218'))) WHERE((("comment_reply"."recipient_id" = '8218') AND ("comment_reply"."read" = 'f')) AND ("person"."bot_account" = 'f')) ORDER BY "comment"."published" DESC LIMIT '40' OFFSET '0'

4 seconds: SELECT "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."embed_video_url", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."language_id", "post"."featured_community", "post"."featured_local", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."icon", "community"."banner", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "community_person_ban"."id", "community_person_ban"."community_id", "community_person_ban"."person_id", "community_person_ban"."published", "community_person_ban"."expires", "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published", "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local", "community_follower"."id", "community_follower"."community_id", "community_follower"."person_id", "community_follower"."published", "community_follower"."pending", "post_saved"."id", "post_saved"."post_id", "post_saved"."person_id", "post_saved"."published", "post_read"."id", "post_read"."post_id", "post_read"."person_id", "post_read"."published", "person_block"."id", "person_block"."person_id", "person_block"."target_id", "person_block"."published", "post_like"."score", coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") FROM (((((((((((("post" INNER JOIN "person" ON ("post"."creator_id" = "person"."id")) INNER JOIN "community" ON ("post"."community_id" = "community"."id")) LEFT OUTER JOIN "community_person_ban" ON ((("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP)))) INNER JOIN "post_aggregates" ON ("post_aggregates"."post_id" = "post"."id")) LEFT OUTER JOIN "community_follower" ON (("post"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = '-1'))) LEFT OUTER JOIN "post_saved" ON (("post"."id" = "post_saved"."post_id") AND ("post_saved"."person_id" = '-1'))) LEFT OUTER JOIN "post_read" ON (("post"."id" = "post_read"."post_id") AND ("post_read"."person_id" = '-1'))) LEFT OUTER JOIN "person_block" ON (("post"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = '-1'))) LEFT OUTER JOIN "community_block" ON (("community"."id" = "community_block"."community_id") AND ("community_block"."person_id" = '-1'))) LEFT OUTER JOIN "post_like" ON (("post"."id" = "post_like"."post_id") AND ("post_like"."person_id" = '-1'))) LEFT OUTER JOIN "person_post_aggregates" ON (("post"."id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = '-1'))) LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = '-1'))) WHERE ((((((((("community"."hidden" = 'f') OR ("community_follower"."person_id" = '-1'))AND ("post"."url" = 'https://blog.fabiomanganiello.com/article/Web-3.0-and-the-undeliverable-promise-of-decentralization')) AND ("post"."nsfw" = 'f')) AND ("community"."nsfw" = 'f')) AND ("post"."removed" = 'f')) AND ("post"."deleted" = 'f')) AND ("community"."removed" = 'f')) AND ("community"."deleted" = 'f')) ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."score" DESC , "post_aggregates"."published" DESC LIMIT '6' OFFSET '0'

johanndt commented 1 year ago

I've got some SQL optimization experience. I don't have an instance with real data and real load to run these on. Any chance you can run this with: EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT ....

We can then paste the query and plan into a visualizer like https://tatiyants.com/pev/#/plans and see what's really going on.

Also, 5 connections for anything public facing is going to have issues. I'd highly recommend upping the number of connections and also putting a connection pooler like pgbouncer in front of the db.

Vekseid commented 1 year ago

Is left join really the optimal solution in Postgres compared to a NOT IN (select ban type here)... subquery?

dessalines commented 1 year ago

@johanndt We are in desperate need of SQL experts, as my SQL skills are very mediocre. We'd love to have your expertise.

I'll post that query JSON shortly.

@Vekseid no, joins are better than in queries with potentially thousands of inserted IDs.

Nutomic commented 1 year ago

@johanndt Thanks, any help with this is very appreciated because postgres is the biggest performance bottleneck now. I ran the explain for all queries and uploaded the output to my Nextcloud: https://nextcloud.nutomic.com/index.php/s/pARP6eSCbY98QDM

johanndt commented 1 year ago

Thanks @Nutomic - I'll have a look tonight after work. And agree on the sentiment that IN queries are slow when there are lots of IDs to check.

johanndt commented 1 year ago

@Nutomic - I had a quick look (I'm only looking at Query 1). Generally it's using indexes everywhere, so let's not try to over-optimize, but the final top level NESTED LOOP is taking 47% of the time. And I believe this loop is mostly adding the columns which is used for sorting. Can you please run it again without the ORDER BY clause?

If that's faster eg 4 seconds instead of 8, add the various ORDER BYs back in to see which one is causing the slowness? I'm highly suspicious of the hot_rank() function. This still only accounts for 50% of the slowness, but we have to start somewhere.

Other quick findings... finding Post Likes and Post Reads are the other slow parts - which makes sense as I'm sure those are the biggest tables in the system.

RocketDerp commented 1 year ago

@dessalines "We are in desperate need of SQL experts, as my SQL skills are very mediocre. We'd love to have your expertise."

Is there any kind of documentation that describes the paths that the code takes for basic parts of the site such as: 1) listing the postings, the directory of postings. 2) opening a single posting and loading the comments, 3) loading a user profile. 4) how deletes are performed.

dessalines commented 1 year ago

The order that things are loaded in isn't too important for the DB.

What's most important, is optimizing our DB queries (and triggers), which are nearly all single fetch queries with a ton of joins to get all the relevant data. For example, a CommentView has an attached post, creator, community, aggregate counts, etc.

These joins are done in diesel in the crates/db_views_* folders, and the specific DB tables are in crates/db_schema . Some of the bigger joins and queries are in post_view.rs, and comment_view.rs.

johanndt commented 1 year ago

So while waiting for the results of running without sorting, I've looked more into the hot_rank() function. I found @dessalines 's DBA Stack Exchange Post talking about it.

In the execution plan I don't see any nodes using the idx_post_aggregates_active index. Not sure why that is, but might be because the scores and latest comment time of the posts are changing more quickly now, and the index can't be used because the values being searched for are not available? Sorry not sure, since you're doing some strange things with a mutable function being indexed as immutable, and the behavior is probably undefined.

I see in the code the index is recalculated every hour, to "fix" the changing values. Even if that really did fix things, rebuilding an index is very expensive and is not a scalable solution long term as the amount of posts increase.

Instead of hourly rebuilding an index, why not just write the current hot_rank into a column of post_aggregates? It will make all the queries simpler and faster. If you index that column it'll always be used and will be up to date. You can update it whenever you update the aggregates of a post. Which I imagine happens whenever a post is upvoted or commented on. So for "hot" posts that will be pretty often.

And then also do an hourly update to catch any posts which didn't have a recent update.

You can also optimize this hourly update to only work on a subset of the posts. As I understand it the hot_rank algorithm is a decaying function. So obviously after a certain number of days it will stop being hot, then it can be marked as NULL or 0 hotness, and you only have to do the hourly update on posts of the last x days.

Think of it this way. For every comment or vote which happens on a post, how many times did someone just read or list it? Obviously changes to a post would happen a lot less than people just browsing and reading posts. For every 1 update you might probably get 100s of read requests. So it would make sense to update the hot_rank only on updates. At the moment you are calculating it on every read.

And as I said, rebuilding an index every hour is going to start hurting a lot in terms of db bloat and cpu usage.

johanndt commented 1 year ago

Also looked at Query 4. This one seems more straight-forward. Everything is fast and indexed, except for grabbing the correct row(s) from the post table.

This WHERE clause is causing a sequential scan: ((NOT post.nsfw) AND (NOT post.removed) AND (NOT post.deleted) AND (post.url = 'https://blog.fabiomanganiello.com/article/Web-3.0-and-the-undeliverable-promise-of-decentralization'::text))

Was looking through the code, but couldn't find where indexes are defined. Is there an index on post.url? Or maybe a combination of url, nsfw, removed, deleted?

Nutomic commented 1 year ago

The database initialization is handled by migrations in the root folder of this repo. You can follow the local development guide to get a local Lemmy instance running with database, and use psql to inspect it. If you can make a pull request with the suggested changes, that would be very helpful. I will try the ORDER BY now.

Edit: you are right, the hot rank accounts for most of the slowness. Removing it reduces the query time to 10%. Here are the outputs. Timings are different because I upgraded to a faster server.

johanndt commented 1 year ago

@Nutomic - I was pretty confident it was the hot_rank. If it was only a query fix, I might've had some time to do a PR, but as per one of my previous messages the suggested fix is a bigger refactor of the approach, rather than just changing one query.

I'm running a company and will be out of the country for a month from next week on a work trip, so will not be able to contribute coding time. But happy to answer more questions or do some analysis of the dumps.

dessalines commented 1 year ago

Instead of hourly rebuilding an index, why not just write the current hot_rank into a column of post_aggregates?

Whoa... I didn't know why I didn't think of that. And then just run a periodic job to update that hot_rank column, rather than reindex the table?

So obviously after a certain number of days it will stop being hot, then it can be marked as NULL or 0 hotness, and you only have to do the hourly update on posts of the last x days.

GOOD point. Our time decay function seems to peak out at about two days, so the last rolling week of posts, rather than all historical ones, should be good to do this.

I'm leaning towards a periodic job rather than a DB trigger, because a trigger might have to work on every single vote, which could cause a lot of unecessary calculations.

Another benefit of that, is that we can move that time decay function to rust, rather than the hacky way I've added it to postgres.

Thank you so much for this one!

dessalines commented 1 year ago

This WHERE clause is causing a sequential scan: ((NOT post.nsfw) AND (NOT post.removed) AND (NOT post.deleted) AND (post.url = 'https://blog.fabiomanganiello.com/article/Web-3.0-and-the-undeliverable-promise-of-decentralization'::text))

I know the answer to this question is never easy, because its not always predictable which index postgres will use... but do you recommend compound indexes for our most common queries, or single column indexes?

Two9A commented 1 year ago

Naive outside thought: if those fields (nsfw, removed, deleted) are bools is it worth storing them in a bitfield, and translating at the ORM layer? The rest of the existing code could continue to reference those fields on the post object, while queries like "not nsfw and not removed" become attributes & 0x09 = 0 or similar.

(Bonus: adding new flags involves no schema changes.)

Shananra commented 1 year ago

Would it be possible to get a db dump with sample data? I could help with some of these queries.

johanndt commented 1 year ago

Whoa... I didn't know why I didn't think of that. And then just run a periodic job to update that hot_rank column, rather than reindex the table?

@dessalines - Haha, that's okay. Sometimes you're so close to the problem it's hard to see anything else.

I'm leaning towards a periodic job rather than a DB trigger, because a trigger might have to work on every single vote, which could cause a lot of unecessary calculations. Another benefit of that, is that we can move that time decay function to rust, rather than the hacky way I've added it to postgres.

Yes that sounds good. In one of the lemmy community threads about performance someone said that scaling will mean that some things should happen more periodically in batches. I agree, and this change is in the right direction for that kind of thinking.

I know the answer to this question is never easy, because its not always predictable which index postgres will use... but do you recommend compound indexes for our most common queries, or single column indexes?

I personally prefer single indexes. My co-founder always argue with me saying that compound ones are all you need, since Postgresql can figure it out. But there are cases where a compound index won't help you, when you're not using the first column in the index to filter by. See this good article for more info: https://www.cybertec-postgresql.com/en/combined-indexes-vs-separate-indexes-in-postgresql/

In this specific case, I think the url column is enough, because an index on the other columns won't really help you much? Those columns are boolean, meaning only 1 of 2 values. And what percentage of your posts will be deleted or removed? Probably only a very small percentage - which means 95%+ of the index will be the same value and it will have to scan through it all in any case.

Naive outside thought: if those fields (nsfw, removed, deleted) are bools is it worth storing them in a bitfield, and translating at the ORM layer?

That's a bad idea, sorry @Two9A. Imagine the case where you're trying to retrieve a single post, but it has been deleted. With the WHERE clause in the database, it will notice immediately that the there are 0 "non-deleted" rows and will stop the query and return a 0 rows result. However if you're just getting it and not checking the flags, then it will find the row and continue to do all the other 10+ joins in the database. All these queries are joining a LOT of info together. So that's a lot of extra work you're now giving the database to do which weren't necessary. Combining things into a bitfield means you can't index on it either.

Also, if down the road you want to change the type of a column, then what do you do if it's in a bitflag? Let's say instead of a binary state, deleted might have multiple options (perma-deleted, shadow-deleted, not-deleted, etc). Now you're stuck having to write more complex data migrations. Schema changes are not a big deal, so I think not having to make schema changes is not a real benefit. Bitfields are something you use for embedded C code where you're trying to save space. Internally in postgresql they probably already store booleans as just a single bit or 2.

johanndt commented 1 year ago

I saw the PR already got closed, so just linking here to my comment on the PR about remaining performance issues @dessalines and @Nutomic : https://github.com/LemmyNet/lemmy/pull/2952#discussion_r1223796997

RocketDerp commented 1 year ago

I advise that all Lemmy servers enable PG extension pg_stat_statements to debug slow SQL statements. This way logging does not have to be cranked up.

Lemmy discussion and some example queries: https://lemmy.ml/post/1361757

tbe commented 1 year ago

Is there any load driver for lemmy right now? Or can we generate a huge bunch of data?

There are tools like PoWA, that would allow a better insight in the performance issues, but that would require production like data to be useful. On the other hand, it has an performance impact, so running on production is not always a good idea, unless one decide to run a split setup.

I have 12 years experience in this sector, so i may be of help.

Nutomic commented 1 year ago

Gonna close this as the OP queries are already optimized.

phiresky commented 1 year ago

Hey everyone. Posting here since there's people that can maybe help more (@tbe @johanndt ). Here's the top 30 currently most expensive queries on lemmy.world (by total_exec_time). Note that times are in milliseconds.

https://gist.github.com/phiresky/7ae7c232c21014d863d19d615df506a6

The instance.* queries will be fixed by #3486 . A lot of the post queries are almost the same but not quite. Also you can contact me on @phiresky:matrix.org, I can run explain queries against a copy of a fairly large instance.

Edit: also see my comment on the PR https://github.com/LemmyNet/lemmy/issues/2994#issuecomment-1624138741

tbe commented 1 year ago

Great! That helps a lot. I will reach out to get some statistics and explain plans for some of the tables as soon as I had time to check against my dummy database

Am 6. Juli 2023 20:45:45 MESZ schrieb phiresky @.***>:

Hey everyone. Posting here since there's people that can maybe help more @.*** @johanndt ). Here's the top 30 currently most expensive queries on lemmy.world (by total_exec_time). Note that times are in milliseconds.

https://gist.github.com/phiresky/7ae7c232c21014d863d19d615df506a6

The instance.* queries will be fixed by #3486 . A lot of the post queries are almost the same but not quite. Also you can contact me on @phiresky:matrix.org, I can run explain queries against a copy of a fairly large instance.

-- Reply to this email directly or view it on GitHub: https://github.com/LemmyNet/lemmy/issues/2877#issuecomment-1624155780 You are receiving this because you were mentioned.

Message ID: @.***>

Shananra commented 1 year ago

Has anyone published a database with generated data for testing these queries?

tbe commented 1 year ago

I am working on it. But my data distribution seems off, as I don't get the same effect as most productive instances.

Hopefully, statistics from an effected instance help me to fix this

Am 6. Juli 2023 22:03:03 MESZ schrieb Adam Weber @.***>:

Has anyone published a database with generated data for testing these queries?

-- Reply to this email directly or view it on GitHub: https://github.com/LemmyNet/lemmy/issues/2877#issuecomment-1624241990 You are receiving this because you were mentioned.

Message ID: @.***>

Jelloeater commented 1 year ago

Over on Lemmy.World we're seeing this query take things down hard. Getting ton's of connections spammed in a short time. Had to limit the number of connection to the DB to keep things online.

Any thoughts @dessalines ?

SELECT post . id, post . name, post . url, post . body, post . creator_id, post . community_id, post . removed, post . locked, post . published, post . updated, post . deleted, post . nsfw, post . embed_title, post . embed_description, post . thumbnail_url, post . ap_id, post . local, post . embed_video_url, post . language_id, post . featured_community, post . featured_local, person . id, person . name, person . display_name, person . avatar, person . banned, person . published, person . updated, person . actor_id, person . bio, person . local, person . private_key, person . public_key, person . last_refreshed_at, person . banner, person . deleted, person . inbox_url, person . shared_inbox_url, person . matrix_user_id, person . admin, person . bot_account, person . ban_expires, person . instance_id, community . id, community . name, community . title, community . description, community . removed, community . published, community . updated, community . deleted, community . nsfw, community . actor_id, community . local, community . private_key, community . public_key, community . last_refreshed_at, community . icon, community . banner, community . followers_url, community . inbox_url, community . shared_inbox_url, community . hidden, community . posting_restricted_to_mods, community . instance_id, community . moderators_url, community . featured_url, community_person_ban . id, community_person_ban . community_id, community_person_ban . person_id, community_person_ban . published, community_person_ban . expires, post_aggregates . id, post_aggregates . post_id, post_aggregates . comments, post_aggregates . score, post_aggregates . upvotes, post_aggregates . downvotes, post_aggregates . published, post_aggregates . newest_comment_time_necro, post_aggregates . newest_comment_time, post_aggregates . featured_community, post_aggregates . featured_local, post_aggregates . hot_rank, post_aggregates . hot_rank_active, post_aggregates . community_id, post_aggregates . creator_id, community_follower . id, community_follower . community_id, community_follower . person_id, community_follower . published, community_follower . pending, post_saved . id, post_saved . post_id, post_saved . person_id, post_saved . published, post_read . id, post_read . post_id, post_read . person_id, post_read . published, person_block . id, person_block . person_id, person_block . target_id, person_block . published, post_like . score, coalesce ( ( post_aggregates . comments - person_post_aggregates . read_comments ), post_aggregates . comments ) 
FROM ( ( ( ( ( ( ( ( ( ( ( ( post_aggregates 
    INNER JOIN person ON ( post_aggregates . creator_id = ? . id ) ) 
    INNER JOIN post ON ( post_aggregates . post_id = ? . id ) ) 
    INNER JOIN community ON ( post_aggregates . community_id = ? . id ) ) 
    LEFT OUTER JOIN community_person_ban ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_person_ban . person_id = ? . creator_id ) ) ) 
    LEFT OUTER JOIN community_follower ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_follower . person_id = ? ) ) ) 
    LEFT OUTER JOIN post_saved ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_saved . person_id = ? ) ) ) 
    LEFT OUTER JOIN post_read ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_read . person_id = ? ) ) ) 
    LEFT OUTER JOIN person_block ON ( ( post_aggregates . creator_id = ? . target_id ) AND ( person_block . person_id = ? ) ) ) 
    LEFT OUTER JOIN community_block ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_block . person_id = ? ) ) ) 
    LEFT OUTER JOIN post_like ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_like . person_id = ? ) ) ) 
    LEFT OUTER JOIN person_post_aggregates ON ( ( post_aggregates . post_id = ? . post_id ) AND ( person_post_aggregates . person_id = ? ) ) ) 
    LEFT OUTER JOIN local_user_language ON ( ( post . language_id = ? . language_id ) AND ( local_user_language . local_user_id = ? ) ) ) 
WHERE ( ( ( ( ( ( ( ( community . removed = ? ) AND ( community . deleted = ? ) ) AND ( post . removed = ? ) ) AND ( post . deleted = ? ) ) AND ( community . local = ? ) ) AND ( ( community . hidden = ? ) OR ( community_follower . person_id = ? ) ) ) AND ( post . nsfw = ? ) ) AND ( community . nsfw = ? ) ) 
ORDER BY post_aggregates . featured_local DESC, post_aggregates . hot_rank DESC, post_aggregates . published DESC 
LIMIT ? OFFSET ?

RocketDerp commented 1 year ago

post_aggregates has no index on community_id, and I saw a big improvement with PostgreSQL 15.3 on my live data by adding an index. https://github.com/LemmyNet/lemmy/pull/3839

 INNER JOIN community ON ( post_aggregates . community_id = ? . id ) ) 

 LEFT OUTER JOIN community_follower ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_follower . person_id = ? ) ) ) 

 LEFT OUTER JOIN community_block ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_block . person_id = ? ) ) )

Lemmy.world has over 9,000 communities locally alone.

I mentioned in the pull request that there also is another index addition for creator_id on post_aggregates that might help, but still testing on different data patterns:. For profile listings, I think it would help. And generally a lot of posts are done by the same users, it isn't really that distributed.

INNER JOIN person ON ( post_aggregates . creator_id = ? . id ) ) 

LEFT OUTER JOIN person_block ON ( ( post_aggregates . creator_id = ? . target_id ) AND ( person_block . person_id = ? ) ) )

There are some other general issues, such as languages tables really getting large for community and featured isn't really that many rows of posts but burdens those indexes. And some of the stuff going on with JOIN I think cries out for client caching or internal Rust caching... a lot of trust is being put on PostgreSQL to do the right thing when, for example, looking up the person details of every post creator. Maybe a shift away from JOIN for this and use IN with subquery.

What's with all the question-marks, "?" for table names? normally when I run AUTO_EXPLAIN- or pg_stat_statements extension, I see numbered parameters, not ?. Curious how this got logged.

Had to limit the number of connection to the DB to keep things online.

I did submit a pull-request to demonstrate self-awareness concurrency throttle so that at least the Rust could could have some concept. It would be expanded and applied only to certain API paths, but it would put more control into us as programmers to study what is actually happening and log more details such as patterns for specific users. I've even been prototyping querying PostgreSQL how busy it is with running queries before submitting certain operations... another form of self-governing logic.

RocketDerp commented 1 year ago

I am finally able to reproduce incredibly slow queries on routine listing of posts with my own generated test data! I'm running against latest main checkout.

Given the two communities above, there is a difference between 10.5 seconds PostgreSQL explain and 0.3 seconds! The community with more postings (*_quantity2), listing by any method (New, Hot, Active) takes 10.5 seconds. The important factor is user being logged-in. If I use a non-logged-in API call, it takes 0.3 seconds against both communities. This is on a routine "first page" listing of limit = 5 postings. There are no featured in either community.

I'm homing in on the specific issue with more testing.

RocketDerp commented 1 year ago

Ok, here are the full EXPLAIN ANALYZE for the exact same community, no variation in community data between the two queries. One takes over 14 full seconds, the other takes 9ms! one user is logged-in, the other user is anonymous. This 14 second query is performed by a virgin account, no blocks, nothing ever read, nothing ever created, no communities joined, etc. auto_explain_list_post_community2.txt

Jelloeater commented 1 year ago

@RocketDerp If there is anything I can do to help, LMK. Feel free to DM me. https://mastodon.social/@jesswill42

RocketDerp commented 1 year ago

Here is what I've figured out so far... I can still reproduce the problem based on the amount of posts in my community. This is with real Lemmy data I've also collected for 2 months on my instance server - plus the artificial communities I created with various amounts of data (11,8xx posts in community for the explain I shared being 13 seconds logged-in vs. 10ms for anonymous).

The amount of total data in post table seems to be a significant factor. I found that merely adding one filter to post_view.rs makes not logged-in and logged-in performance equalize, performant:

  query = query.filter(post_aggregates::published.gt(now - 2.weeks()));

Everything in that test community is more recent than 2 weeks. So it does not impact the amount of rows for my specific test community. What it does accomplish is force PostgreSQL to focus more on post_aggregates table filtering before JOIN considerations.

This query for posts and comments are very difficult to wrap your head around as they seem "open ended" and rely on LIMIT to have sanity. My post_aggregates/post tables currently have 506212 rows... relying on LIMIT 50is kind of a weak force when you are joining with tables that have potential for huge numbers of records such as post_read - which on a very active instance like Lemmy.world with lots of users reading post, will have much more data than a smaller instance.

I come from eras of far slower hardware., and my instinct is what @Vekseid said earlier in this discussion - to use WHERE IN on query and subselect. A solid foundation of posts or comments that match the basic browsing of Hot/Active/New before any JOIN logic kicks in. Right now, Lemmy kind of has the INDEX thinking so it's at the very end that it is doing the sort ORDER BY... I think WHERE clause elimination INDEX needs some pondering.

RocketDerp commented 1 year ago

ok, for a logged-in user removing:

query = query.filter(community_block::person_id.is_null());

https://github.com/LemmyNet/lemmy/blob/654bc513ea46e0b2d6dc60aee118ebf152752d11/crates/db_views/src/post_view.rs#L325

Seems to equalize performance. I'll do EXPLAIN ANALYZE on the query with and without this one line while logged-in. My testing user has zero blocked communities.

RocketDerp commented 1 year ago

Given my community_block table has zero rows, this may not be applicable to other instances. But it illustrates just how fragile this query is that a reference to an empty table turns a 18ms SELECT into over 14 full seconds. Attached is both EXPLAIN ANALYZE output: auto_explain_list_post_community3.txt. Reminder that this 14 second query performance doesn't happen on every community, it seems to happen when my community has between 4,750 and 11,850 posts. My 4,750 post test community does not hit the slowdown.

In this specific case, a Rust code change seems pretty obvious: if there is a specified community_id for the query, do not reference the community_block table at all. In other words... if the query is for a single community, why would you be blocking it?

RocketDerp commented 1 year ago

@Jelloeater - what's your RAM config in PostgreSQL?

I was deliberately running under-tuned to reproduce problems, because I don't have the user-load and local content Lemmy.world does - but the query did improve with brute force of more RAM. I just put in proper tuning params again and 11K posts isn't running slow. Your PostgreSQL 15.3 config should have at least values I'm using on a 24GB RAM system with slow mechanical hard drives:

max_connections = 500
shared_buffers = 6GB
effective_cache_size = 18GB
work_mem = 6291kB
maintenance_work_mem = 1536MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2

NomNuggetNom commented 1 year ago

For convenience, here are @RocketDerp's queries in a visualizer. The problematic one, and the normal one.

This sticks out to me in the problematic query (maybe an index is missing?), but I'm not an expert.

Another observation: the person_block join comes after the post_read join on the normal query. On the problematic query, it comes before.

drumlinish commented 1 year ago

It looks like the first visualization benefits from the idx_post_aggregates_community index added in https://github.com/LemmyNet/lemmy/pull/3839.

Some of the filters are on person_id = -1 which could be optimized by not joining when it is -1 or short circuiting it with something like where $1 > 0 and person_id = $1 (not sure how it can be expressed with Diesel).

RocketDerp commented 1 year ago

The -1 is when nobody was logged in, which I artificially set for the specific logged-in user I was testing to isolate the statement differences. I'm reverting to stock main right now and generating a larger test community. The idx_post_aggregates_community index seemed to help on blended communities (All, Subscribed), but on single-community listing it didn't seem to get utilized.

RocketDerp commented 1 year ago

I managed to get a reproducible segmentation fault in PostgreSQL 15.3.1 with lemmy database and a query I have identified. It happens instantly, 100% of the time. Precise version: "Ubuntu 15.3-1.pgdg22.04+1". Has anyone been looking for crashes in their PostgreSQL logs?

PostgreSQL will automatically restart itself, but while doing so you will also see Lemmy log about it:

LemmyError { message: Unknown("Error occurred while creating a new object: the database system is in recovery mode"), inner: Error occurred while creating a new object: the database system is in recovery mode, context: SpanTrace [{ target: "lemmy_apub::objects::person", name: "read_from_id", file: "crates/apub/src/objects/person.rs", line: 66 }, { target: "lemmy_apub::fetcher::user_or_community", name: "read_from_id", file: "crates/apub/src/fetcher/user_or_community.rs", line: 47 },

Lemmy-ui front page will go into "Error" page, an API call with JavaScript is likely to get exception thrown: "Gateway Time-out", and system typically self-recovers within 90 seconds. Evidence of past crashes to look for is: /var/crash/_usr_lib_postgresql_15_bin_postgres.*.crash,

EDIT: good news is the query that I found crashing PostgreSQL 15.3.1 seems to require a Lemmy login, the anonymous API does not seem to trigger it.

drumlinish commented 1 year ago

I assume a significant part of the database load is from not-logged-in users, so optimizing the person_id = -1 queries could potentially reduce the overall load on the database without changing any APIs or the structure of the queries.

Example of join we probably could avoid when person_id=-1: https://github.com/LemmyNet/lemmy/blob/9b710a2ed3df7411f97e873ae445e96fa5a8bd56/crates/db_views/src/post_view.rs#L84-L90

phiresky commented 1 year ago

I don't think the core issue of all post fetch queries has been posted here yet. All of the joins are somewhat irrelevant in the grand picture. The issue is that there is no way currently for PostgreSQL to use an optimal filtering strategy for all queries that aren't the All page (so filtering by communities).

All of those seemingly bad joins are a distraction I think. The joins are only slow because postgresql first fetches like 100k posts, joins all the other tables, and only then orders and limits the results to the top 20. If it was only fetching 20 rows in the first place, the query would be fast regardless of all those joins.

Taking as an example the User followed Hot sort front page. Here, the DB has three choices:

Use the `idx_post_aggregates_featured_local_hot on post_aggregates (featured_local, hot_rank, published)
Use a new index that might optimally be on post_aggregates (community_id, featured_local, hot_rank, published)
Do sequential scans

The first option is only good when the user follows multiple very popular communities, because then filtering out posts by irrelevant communities in memory is cheap.

The second option is only good when the user only follows very empty communities, because PG cannot scan multiple ordered sequences of an index to return the top N results. So this index results in postgresql retrieving all posts of the followed communities, then sorting and filtering them in memory.

Since normal joins in postgresql are always single-step, PG can only rely on the general statistics it has about the kinds of communities most people follow, which causes huge misestimations.

The reason these queries are so fragile is because PG switches between them because there is no obvious answer - the optimal index does not exist, and PG does not have a feature to use the existing indexes in an actually good way (not sure why).

I have three solutions to this. They are all bad in some way:

Use a LATERAL join with a subquery for the subscribed-view. The query would look something like this:

SELECT
*
FROM
community_follower cf
CROSS JOIN LATERAL (
    SELECT
        post_aggregates.id,
        post_aggregates.post_id,
        post_aggregates.comments,
        post_aggregates.score,
        post_aggregates.upvotes,
        post_aggregates.downvotes,
        post_aggregates.published,
        post_aggregates.newest_comment_time_necro,
        post_aggregates.newest_comment_time,
        post_aggregates.featured_community,
        post_aggregates.featured_local,
        post_aggregates.hot_rank,
        post_aggregates.hot_rank_active,
        cf.id,
        cf.community_id,
        cf.person_id,
        cf.published,
        cf.pending,
        post_aggregates.comments
    FROM
        post_aggregates
    WHERE
        post_aggregates.community_id = cf.community_id
    ORDER BY
        post_aggregates.featured_local DESC,
        post_aggregates.hot_rank DESC
    LIMIT 20) AS p
WHERE
cf.person_id = 2
ORDER BY
p.featured_local DESC,
p.hot_rank DESC
LIMIT 20;

All those otther joins would be inside the subquery. The reason this works is that it allows PG to first fetch the followed communities, then fetch only exactly 20 posts per community (instead of all of them), then order and limit them. I tried to get PG to do this with CTE, subqueries and similar but this is the only thing I got to work. Sadly, diesel does not support LATERAL. Also, pagination would still mean fetching all first page_id*20 posts.

Do the same as the lateral join but in Rust (i.e. first fetch all the followed communities, then fetch the best 20 posts of each community, then sort and limit.
Add a column community_hot_rank_position that tracks the rank() over (group by community_id order by featured_local desc, hot_rank desc) for every post. Then in the post query, simply add a WHERE community_hot_rank_position < 20. This is fine but again pagination is ugly. It also needs a somewhat expensive scheduled task. (i mean - around as expensive as a single post fetch query sooo..)

RocketDerp commented 1 year ago

PostgreSQL 15.4 was released today, I'm waiting on the arm64 .deb files to be released and I'll upgrade from 15.3.1 and see if I can still crash PostgreSQL with the Lemmy query. My database was freshly exported and re-imported just last week using pg_dump to plain text. So I don't think there is any corruption...

phiresky commented 1 year ago

did you check for OOM errors in dmesg and the postgresql logs? seems unlikely the crashes are anything but memory issues

RocketDerp commented 1 year ago

The joins are only slow because postgresql first fetches like 100k posts

There is no WHERE criteria on the default Sort orders... It's open-ended and relies on LIMIT.

When doing a subscribed / multi-community listing, I think an approach to consider is have a freshness column int4 set on every post_aggregates row on INSERT and UPDATE (lemmy Edit). And then have a scheduled job that unsets freshness.

An Important Aspect of the "unset freshness" batch job: communities with less than say 500 posts, ALL their posts are set to "fresh".... communities that have more than 500 posts, only 500 are set fresh based on their updated & published datestamps. This gets front and center the general content readability focus of Lemmy... even a 5-month old posting for a low-posting (80 post) community would still be a candidate for presentation and ranking via the Hot/Active/Popular algorithms. But it ages out old posts and ages out > 500 posts for any one community... it puts a performance barrier on routine listing that hits in a WHERE clause before any other filtering logic and LIMIT kicks in.

RocketDerp commented 1 year ago

did you check for OOM errors in dmesg and the postgresql logs

dmesg shows nothing, postgrss logs don't even show a stack trace. They just show the most simple message of process ending. The .crash file doesn't include any usful hint I can see. It's possible that turning off AUTO_EXPLAIN might make it go away... it's a very specific Lemmy query I'm still hand-analyzing to see why.

I'ts 64-bit system and I have 12GB of RAM free (with 10GB already allocated to PostgreSQL, 24GB total) and even added a 4GB swap file to see if that made any difference. It didn't. Swap file doesn't start to get used. The crash is instant, like 10ms after this SELECT query goes from Lemmy's Rust code to PostgreSQL... boom!

I'm somewhat reluctant to install all the debug symbols, I'll probably clone the database and reproduce locally before doing that on my production server.

RocketDerp commented 1 year ago

ok, some queries don't just take 14 seconds... some never return at all to get logged as slow queries...

SELECT running for 35 minutes elapsed so far. Nginx timed out a long time ago, but PostgreSQL still churning. This is listing a single community. It's all about how many posts there are in that community, these behaviors happen once you reach certain number of row thresholds. That's why bigger servers like Lemmy.ml with more history are crashing so hard.

Query to spot these running queries that don't return:

SELECT usename, query, wait_event, wait_event_type, query_start, state_change, to_char(age(clock_timestamp(), query_start), 'HH24:MI:SS FF6') AS elapsed
FROM pg_stat_activity
WHERE usename = 'lemmy'
ORDER BY query_start

In defense against these kind of runaway query issues... I'm studying if Rust Diesel can prepend a SET statement before individual SELECT statements. As an alternative to that, I think one option is to create a second PostgreSQL user for the sole purpose of doing routine SELECT queries - keeping the original 'lemmy' user for migrations, insert, update, and delete activities. Because it is possible to set a specific user to have a statement timeout: ALTER ROLE lemmy_select1 SET statement_timeout = 8000; -- milliseconds. By doing it this way, a server operator could even alter the timeout without having to touch or restart Rust code...

Remember the opening sentence of this thread from May 24:

There are some problems with database lockups which seem to be caused by slow queries.

I have a query in production that's still running after 1 hour 45 minutes...

RocketDerp commented 1 year ago

I revised all the Rust code to have two PostgreSQL connection pools for two different users, then altered the post listing SELECT to work with the 2nd login. The code changes to Rust were not too bad, I established a second environment variable and mostly had to copy/paste existing code and add 2nd variable here and there. That only took me 30 minutes. Spent far more time devising tests and getting PostgreSQL permissions wrangled so that the 2nd user could SELECT against tables created by the original 'lemmy' user. Anyway, the bad news is that the timeout doesn't seem to be enforced! The runaway SELECT just keeps on going... the Diesel Rust code must be overriding the timeout I set on the 2nd user. I tested the same login with Lemmy tables using NodeJS code and the timeout returns "canceling statement due to statement timeout" for that 2nd user, where original 'lemmy' user gets no such timeout.

-or- the problem is so deep within PostgreSQL that the timeout cancel isn't working. As I've had to use killall -9 against postgres processes, as the query hangs it so bad even stop of service and normal killall don't do the job.

EDIT: updated to PostgreSQL 15.4, still reproducible crash.

RocketDerp commented 1 year ago

ok, so I've got like 3 or 4 pans on the burners related to this general topic.

Establishing a second PostgreSQL login lemmy_read0 that does only the performance-critical post and comment queries is a good direction to harden Lemmy against data-growth related issues, as PostgreSQL tools frequently focus on logging data around the USER/ROLE name
The PostgreSQL 15.4 backend should not be crashing or going into 100% CPU with a query that can not be killed because of a SELECT, so I am going to try to get that reported to the PostgreSQL project.
I proposed a "freshness" column on post_aggregates for every post that focuses on the Lemmy end-user content reading of both massive communities and tiny communities. The intention here is to provide a WHERE clause that gravitates towards recent content that Lemmy users are most often browsing while also telling PostgreSQL to not do JOIN on the ever-growing pile of outdated posts. I will start playing with some code/design ideas.
The PostgreSQL 15.4 crashes and 100% CPU query problems I discovered all happen when the user is logged-in (and do not crash listing the same posts anonymously)... and there are large numbers of posts in the database (113,000 in a single community, 700,000 on a server across all communities). Joins on block lists and 'read posts' seem to send PostgreSQL into these fault areas. I think we should try to put in some immediate attempts to fix this, as I suspect it is causing lemmy.world and other servers with significant data to have PostgreSQL overloads.

Related to number 4 ... @dullbananas put in a pull request https://github.com/LemmyNet/lemmy/pull/3865 yesterday that attempts to rework the logic on these joins. I've not really understood why there is a JOIN on "is null" criteria without a user-id filter, and this new code seems to put that topic front and center:

It does not currently compile for me, syntax issue with Diesel. Can we get the syntax right? Thank you and have a great weekend.

phiresky commented 1 year ago

Here's a minimal setup that shows how the post query would be slow even without any joins:

create table community_followers(person_id integer , community_id integer);
create table test_posts(id bigserial primary key, community_id integer not null, hot_rank double precision not null);

create index on test_posts(community_id, hot_rank desc);
insert into test_posts (community_id, hot_rank) select trunc(random() * 1000), random() from generate_series(1, 5000000);

insert into community_followers values (42, 5), (42, 10), (42, 63), (42, 81); -- follow 4 commmunities
insert into community_followers (person_id, community_id) select 41, trunc(random() * 1000) from generate_series(1, 100); -- follow 100 communities

-- slow query equivalent to what Lemmy currently does
explain analyze select * from test_posts join community_followers c using (community_id) where person_id = 41 order by hot_rank desc limit 10;

--                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
-- Limit  (cost=248999.35..248999.37 rows=10 width=24) (actual time=2965.085..2965.092 rows=10 loops=1)
--   ->  Sort  (cost=248999.35..251791.42 rows=1116831 width=24) (actual time=2965.083..2965.087 rows=10 loops=1)
--         Sort Key: test_posts.hot_rank DESC
--         Sort Method: top-N heapsort  Memory: 26kB
--         ->  Hash Join  (cost=3.55..224865.03 rows=1116831 width=24) (actual time=0.270..2717.359 rows=1051114 loops=1)
--               Hash Cond: (test_posts.community_id = c.community_id)
--               ->  Seq Scan on test_posts  (cost=0.00..163693.78 rows=9999878 width=20) (actual time=0.163..1141.109 rows=10000000 loops=1)
--               ->  Hash  (cost=2.30..2.30 rows=100 width=8) (actual time=0.073..0.074 rows=100 loops=1)
--                     Buckets: 1024  Batches: 1  Memory Usage: 12kB
--                     ->  Seq Scan on community_followers c  (cost=0.00..2.30 rows=100 width=8) (actual time=0.010..0.030 rows=100 loops=1)
--                           Filter: (person_id = 41)
--                           Rows Removed by Filter: 4
-- Planning Time: 0.350 ms
-- Execution Time: 2965.180 ms
--(14 rows)

-- mostly fast query with lateral join
explain analyze select * from community_followers c, lateral (select * from test_posts where community_id = c.community_id order by hot_rank desc limit 20) qq where person_id = 41 order by hot_rank desc limit 20;

--                                                                                         QUERY PLAN
-- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--  Limit  (cost=5477.90..5477.93 rows=10 width=28) (actual time=4.739..4.743 rows=10 loops=1)
--    ->  Sort  (cost=5477.90..5482.90 rows=2000 width=28) (actual time=4.737..4.739 rows=10 loops=1)
--          Sort Key: test_posts.hot_rank DESC
--          Sort Method: top-N heapsort  Memory: 25kB
--          ->  Nested Loop  (cost=0.45..5434.68 rows=2000 width=28) (actual time=0.060..4.147 rows=2000 loops=1)
--                ->  Seq Scan on community_followers c  (cost=0.00..2.30 rows=100 width=8) (actual time=0.015..0.037 rows=100 loops=1)
--                      Filter: (person_id = 41)
--                      Rows Removed by Filter: 4
--                ->  Memoize  (cost=0.45..56.30 rows=20 width=20) (actual time=0.011..0.037 rows=20 loops=100)
--                      Cache Key: c.community_id
--                      Cache Mode: binary
--                      Hits: 5  Misses: 95  Evictions: 0  Overflows: 0  Memory Usage: 111kB
--                      ->  Limit  (cost=0.43..56.29 rows=20 width=20) (actual time=0.010..0.031 rows=20 loops=95)
--                            ->  Index Scan using test_posts_community_id_hot_rank_idx on test_posts  (cost=0.43..27928.43 rows=10000 width=20) (actual time=0.010..0.029 rows=20 loops=95)
--                                  Index Cond: (community_id = c.community_id)
--  Planning Time: 0.205 ms
--  Execution Time: 4.882 ms
-- (17 rows)

-- if min needed hot rank is roughly known this is also very fast
explain analyze select * from test_posts join community_followers c using (community_id) where person_id = 42 and hot_rank > 0.99 order by hot_rank desc limit 20;

RocketDerp commented 1 year ago

where person_id = 42 and hot_rank > 0.99

That use of hot_rank in the WHERE seems like what I'm talking about with 'freshness' tracking of posts. How exactly is hot_rank currently calculated? My general concern is that we don't let large communities dwarf smaller communities... a 2 day old post in a small-activity community is a "big deal" compared to 100 posts new in a popular community. The good news is smaller communities have less posts, so the actual rows of data and updating their 'freshness' value isn't that resource-consuming anyway.

-- slow query equivalent to what Lemmy currently does explain analyze select * from test_posts join community_followers c using (community_id) where person_id = 41 order by hot_rank desc limit 10;

Are we all in agreement that the lack of a 'solid WHERE clause" that cuts off data is the fundamental design problem? It might have worked back in April 2023 with the amount of data Lemmy had in the tables, but by late June that was already exploding. There is too much reliance that "limit 10" is a performance barrier, and it is not.

create index on test_posts(community_id, hot_rank desc); lateral (select * from test_posts where community_id = c.community_id order by hot_rank desc limit 20)

adding a community_id index to post_aggregates I think can provide immediate relief in even normal situations based on my testing. I think we should try to get that in sooner rather than later.

generate_series(1, 5000000)

5 million rows is beyond what I've been testing the logic on ;) Right now I've been generating 500,000 rows through the API for testing logic, I may adapt your bulk techniques to try and put real data into the real table structures of Lemmy so that the normal API testing code can have baselines of scale performance.

phiresky commented 1 year ago

a community_id index to post_aggregates

Yeah, a community_id index allows it to fetch all post in followed communities fast, then filter posts in those communities in memory. But it can also make some queries slower (according to @sunarus). That should happen when the followed communities happen to be very large many and very large. I think that index was supposed to be added with https://github.com/LemmyNet/lemmy/pull/3653 but then wasn't because of sometimes slowdowns. There's some query plans in that PR's comments.

That seems like what I'm talking about with 'freshness' tracking of posts

if a similar filter was used in lemmy, it would need to change depending on which sort the user specifies (hot, top, new, old, ...). It should be possible to get a value for the filter e.g. by

Here's a (unfinished) table:

SortType	Order columns
Active	featured_(local/community) DESC, hot_rank_active DESC, published DESC
Hot	featured_(local/community) DESC, hot_rank DESC, published DESC
New	featured_(local/community) DESC, published DESC
Old	featured_(local/community) DESC, published ASC
Top`INTERVAL`	featured_(local/community) DESC, score DESC, published DESC
MostComments	featured_(local/community) DESC, comments DESC, published DESC
NewComments	featured_(local/community) DESC, newest_comment_time DESC
Controversial	featured_(local/community) DESC, controversy_rank DESC

Basically the filter should be on exactly the columns that are ordered by. An upper bound for the required filter value could be gotten by e.g. fetching the first page without filter from just the largest followed community. That single-community query should always be fast.

LemmyNet / lemmy

Slow SQL queries #2877