LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
13.13k stars 865 forks source link

Slow SQL queries #2877

Closed Nutomic closed 11 months ago

Nutomic commented 1 year ago

There are some problems with database lockups which seem to be caused by slow queries. I set log_min_duration_statement=3000 and collecting any slow queries in this issue. These should be optimized, because in case of of db pool size 5, and 5 users triggering a slow query at the same time, all db queries would fail for the next couple of seconds.

7 seconds: SELECT "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."embed_video_url", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."language_id", "post"."featured_community", "post"."featured_local", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."icon", "community"."banner", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "community_person_ban"."id", "community_person_ban"."community_id", "community_person_ban"."person_id", "community_person_ban"."published", "community_person_ban"."expires", "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published", "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local", "community_follower"."id", "community_follower"."community_id", "community_follower"."person_id", "community_follower"."published", "community_follower"."pending", "post_saved"."id", "post_saved"."post_id", "post_saved"."person_id", "post_saved"."published", "post_read"."id", "post_read"."post_id", "post_read"."person_id", "post_read"."published", "person_block"."id", "person_block"."person_id", "person_block"."target_id", "person_block"."published", "post_like"."score", coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") FROM (((((((((((("post" INNER JOIN "person" ON ("post"."creator_id" = "person"."id")) INNER JOIN "community" ON ("post"."community_id" = "community"."id")) LEFT OUTER JOIN "community_person_ban" ON ((("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP)))) INNER JOIN "post_aggregates" ON ("post_aggregates"."post_id" = "post"."id")) LEFT OUTER JOIN "community_follower" ON (("post"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = '33517'))) LEFT OUTER JOIN "post_saved" ON (("post"."id" = "post_saved"."post_id") AND ("post_saved"."person_id" = '33517'))) LEFT OUTER JOIN "post_read" ON (("post"."id" = "post_read"."post_id") AND ("post_read"."person_id" = '33517'))) LEFT OUTER JOIN "person_block" ON (("post"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = '33517'))) LEFT OUTER JOIN "community_block" ON (("community"."id" = "community_block"."community_id") AND ("community_block"."person_id" = '33517'))) LEFT OUTER JOIN "post_like" ON (("post"."id" = "post_like"."post_id") AND ("post_like"."person_id" = '33517'))) LEFT OUTER JOIN "person_post_aggregates" ON (("post"."id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = '33517'))) LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = '11402'))) WHERE (((((((((("community_follower"."person_id" IS NOT NULL) AND ("post"."nsfw" = 'f')) AND ("community"."nsfw" = 'f')) AND ("local_user_language"."language_id" IS NOT NULL)) AND ("community_block"."person_id" IS NULL)) AND ("person_block"."person_id" IS NULL)) AND ("post"."removed" = 'f')) AND ("post"."deleted" = 'f')) AND ("community"."removed" = 'f')) AND ("community"."deleted" = 'f')) ORDER BY "post_aggregates"."featured_local" DESC , hot_rank("post_aggregates"."score", "post_aggregates"."newest_comment_time_necro") DESC , "post_aggregates"."newest_comment_time_necro" DESC LIMIT '40' OFFSET '0'

3.5 seconds: SELECT "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."embed_video_url", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."language_id", "post"."featured_community", "post"."featured_local", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."icon", "community"."banner", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "community_person_ban"."id", "community_person_ban"."community_id", "community_person_ban"."person_id", "community_person_ban"."published", "community_person_ban"."expires", "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published", "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local", "community_follower"."id", "community_follower"."community_id", "community_follower"."person_id", "community_follower"."published", "community_follower"."pending", "post_saved"."id", "post_saved"."post_id", "post_saved"."person_id", "post_saved"."published", "post_read"."id", "post_read"."post_id", "post_read"."person_id", "post_read"."published", "person_block"."id", "person_block"."person_id", "person_block"."target_id", "person_block"."published", "post_like"."score", coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") FROM (((((((((((("post" INNER JOIN "person" ON ("post"."creator_id" = "person"."id")) INNER JOIN "community" ON ("post"."community_id" = "community"."id")) LEFT OUTER JOIN "community_person_ban" ON ((("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP)))) INNER JOIN "post_aggregates" ON ("post_aggregates"."post_id" = "post"."id")) LEFT OUTER JOIN "community_follower" ON (("post"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = '-1'))) LEFT OUTER JOIN "post_saved" ON (("post"."id" = "post_saved"."post_id") AND ("post_saved"."person_id" = '-1'))) LEFT OUTER JOIN "post_read" ON (("post"."id" = "post_read"."post_id") AND ("post_read"."person_id" = '-1'))) LEFT OUTER JOIN "person_block" ON (("post"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = '-1'))) LEFT OUTER JOIN "community_block" ON (("community"."id" = "community_block"."community_id") AND ("community_block"."person_id" = '-1'))) LEFT OUTER JOIN "post_like" ON (("post"."id" = "post_like"."post_id") AND ("post_like"."person_id" = '-1'))) LEFT OUTER JOIN "person_post_aggregates" ON (("post"."id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = '-1'))) LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = '-1'))) WHERE ((((((("post"."community_id" = '16') AND ("post"."nsfw" = 'f')) AND ("community"."nsfw" = 'f')) AND ("post"."removed" = 'f')) AND ("post"."deleted" = 'f')) AND ("community"."removed" = 'f')) AND ("community"."deleted" = 'f')) ORDER BY "post_aggregates"."featured_community" DESC , hot_rank("post_aggregates"."score", "post_aggregates"."newest_comment_time_necro") DESC , "post_aggregates"."newest_comment_time_necro" DESC LIMIT '20' OFFSET '0'

3.6 seconds: SELECT "comment_reply"."id", "comment_reply"."recipient_id", "comment_reply"."comment_id", "comment_reply"."read", "comment_reply"."published", "comment"."id", "comment"."creator_id", "comment"."post_id", "comment"."content", "comment"."removed", "comment"."published", "comment"."updated", "comment"."deleted", "comment"."ap_id", "comment"."local", "comment"."path", "comment"."distinguished", "comment"."language_id", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."embed_video_url", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."language_id", "post"."featured_community", "post"."featured_local", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."icon", "community"."banner", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "person1"."id", "person1"."name", "person1"."display_name", "person1"."avatar", "person1"."banned", "person1"."published", "person1"."updated", "person1"."actor_id","person1"."bio", "person1"."local", "person1"."banner", "person1"."deleted", "person1"."inbox_url", "person1"."shared_inbox_url", "person1"."matrix_user_id", "person1"."admin", "person1"."bot_account", "person1"."ban_expires", "person1"."instance_id", "comment_aggregates"."id", "comment_aggregates"."comment_id", "comment_aggregates"."score", "comment_aggregates"."upvotes", "comment_aggregates"."downvotes", "comment_aggregates"."published", "comment_aggregates"."child_count", "community_person_ban"."id", "community_person_ban"."community_id", "community_person_ban"."person_id", "community_person_ban"."published", "community_person_ban"."expires", "community_follower"."id", "community_follower"."community_id", "community_follower"."person_id", "community_follower"."published", "community_follower"."pending", "comment_saved"."id", "comment_saved"."comment_id", "comment_saved"."person_id", "comment_saved"."published", "person_block"."id", "person_block"."person_id", "person_block"."target_id", "person_block"."published", "comment_like"."score" FROM ((((((((((("comment_reply" INNER JOIN "comment" ON ("comment_reply"."comment_id" = "comment"."id")) INNER JOIN "person" ON ("comment"."creator_id" = "person"."id")) INNER JOIN "post" ON ("comment"."post_id" = "post"."id")) INNER JOIN "community" ON ("post"."community_id" = "community"."id")) INNER JOIN "person" AS "person1" ON ("comment_reply"."recipient_id" = "person1"."id")) INNER JOIN "comment_aggregates" ON ("comment"."id" = "comment_aggregates"."comment_id")) LEFT OUTER JOIN "community_person_ban" ON ((("community"."id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "comment"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP)))) LEFT OUTER JOIN "community_follower" ON (("post"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = '8218'))) LEFT OUTER JOIN "comment_saved" ON (("comment"."id" = "comment_saved"."comment_id") AND ("comment_saved"."person_id" = '8218'))) LEFT OUTER JOIN "person_block" ON (("comment"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = '8218'))) LEFT OUTER JOIN "comment_like" ON (("comment"."id" = "comment_like"."comment_id") AND ("comment_like"."person_id" = '8218'))) WHERE((("comment_reply"."recipient_id" = '8218') AND ("comment_reply"."read" = 'f')) AND ("person"."bot_account" = 'f')) ORDER BY "comment"."published" DESC LIMIT '40' OFFSET '0'

4 seconds: SELECT "post"."id", "post"."name", "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."embed_video_url", "post"."thumbnail_url", "post"."ap_id", "post"."local", "post"."language_id", "post"."featured_community", "post"."featured_local", "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "community"."id", "community"."name", "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted", "community"."nsfw", "community"."actor_id", "community"."local", "community"."icon", "community"."banner", "community"."hidden", "community"."posting_restricted_to_mods", "community"."instance_id", "community_person_ban"."id", "community_person_ban"."community_id", "community_person_ban"."person_id", "community_person_ban"."published", "community_person_ban"."expires", "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published", "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local", "community_follower"."id", "community_follower"."community_id", "community_follower"."person_id", "community_follower"."published", "community_follower"."pending", "post_saved"."id", "post_saved"."post_id", "post_saved"."person_id", "post_saved"."published", "post_read"."id", "post_read"."post_id", "post_read"."person_id", "post_read"."published", "person_block"."id", "person_block"."person_id", "person_block"."target_id", "person_block"."published", "post_like"."score", coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") FROM (((((((((((("post" INNER JOIN "person" ON ("post"."creator_id" = "person"."id")) INNER JOIN "community" ON ("post"."community_id" = "community"."id")) LEFT OUTER JOIN "community_person_ban" ON ((("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP)))) INNER JOIN "post_aggregates" ON ("post_aggregates"."post_id" = "post"."id")) LEFT OUTER JOIN "community_follower" ON (("post"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = '-1'))) LEFT OUTER JOIN "post_saved" ON (("post"."id" = "post_saved"."post_id") AND ("post_saved"."person_id" = '-1'))) LEFT OUTER JOIN "post_read" ON (("post"."id" = "post_read"."post_id") AND ("post_read"."person_id" = '-1'))) LEFT OUTER JOIN "person_block" ON (("post"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = '-1'))) LEFT OUTER JOIN "community_block" ON (("community"."id" = "community_block"."community_id") AND ("community_block"."person_id" = '-1'))) LEFT OUTER JOIN "post_like" ON (("post"."id" = "post_like"."post_id") AND ("post_like"."person_id" = '-1'))) LEFT OUTER JOIN "person_post_aggregates" ON (("post"."id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = '-1'))) LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = '-1'))) WHERE ((((((((("community"."hidden" = 'f') OR ("community_follower"."person_id" = '-1'))AND ("post"."url" = 'https://blog.fabiomanganiello.com/article/Web-3.0-and-the-undeliverable-promise-of-decentralization')) AND ("post"."nsfw" = 'f')) AND ("community"."nsfw" = 'f')) AND ("post"."removed" = 'f')) AND ("post"."deleted" = 'f')) AND ("community"."removed" = 'f')) AND ("community"."deleted" = 'f')) ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."score" DESC , "post_aggregates"."published" DESC LIMIT '6' OFFSET '0'

uniquePWD commented 1 year ago

Maybe I read this wrong, this is a million miles from my expertise, but perhaps changing score from (upvotes-downvotes) to ((upvotes-downvotes)/community.members) Might address your balance issues

phiresky commented 1 year ago

Changing the sort to weigh community size is a separate issue unrelated to perf that is being discussed in #1026 (and #3378 )

phiresky commented 1 year ago

I made a PR that hopefully fixes post view performance: https://github.com/LemmyNet/lemmy/pull/3872

dullbananas commented 1 year ago

Related to number 4 ... @dullbananas put in a pull request #3865 yesterday that attempts to rework the logic on these joins. I've not really understood why there is a JOIN on "is null" criteria without a user-id filter, and this new code seems to put that topic front and center:

image

It does not currently compile for me, syntax issue with Diesel. Can we get the syntax right? Thank you and have a great weekend.

Now it compiles

RocketDerp commented 1 year ago

I've created an API test script that creates more than 30 users and builds some testing communities. Plus there are some PostgreSQL scripts that do bulk insert of posts and comments. This isn't 'ready to run' as is, if you are interested, let me know and I'll add some glue to bash scripts to the commits: https://github.com/LemmyNet/lemmy/compare/main...RocketDerp:lemmy_server_fixes0:content_simulation_bulk_a0

Right now on a total empty PostgreSQL, bypassing the API and doing massive INSERT of generated data... it takes nearly 1 hour to build 60,000 posts and then 314,913 comments on those posts. Some performance learning I've gained: the overhead of the comment and post tables is pretty high for new content gong in. I created temporary tables (with the same columns and default values) and generating 100,000 comments in a single INSERT takes less than 1/2 a second. Alas, it takes over 20 minutes to transfer that 100,000 comments into the main table with all the TRIGGER, INDEX, and CONSTRAINT in a single INSERT operation from that temp table. I optimized post_aggregates activity for each new comment, see https://github.com/LemmyNet/lemmy/issues/3877

Although #3877 is focused on INSERT and not SELECT queries that this issue we are on is about, it is significant to note the dead tuples from the repeat UPDATE on every comment is slowing down SELECT and throwing off estimates for the query planner.

Also noteworthy: I did manage to get a single SQL INSERT statement for a comment to set ap_id and path all at once, without any UPDATE like Lemmy currently does. The primary key id having to go into fields of the record while inserting. This too would cut down on dead tuples if we can figure out how to get Rust code to do the same. The INSERT statements are in the branch I linked in first paragraph of this comment.

dullbananas commented 1 year ago

@RocketDerp That would be very helpful.

The shell script should automatically start and stop the database and lemmy server, just like test.sh.

phiresky commented 1 year ago

it takes nearly 1 hour to build 60,000 posts and then 314,913 comments on those posts

Make sure you have alter system set synchronous_commit=off set (same as in production) to make sure your numbers are good. Otherwise every single statement/COMMIT will cause a wait for fsync.

For your timing benchmark you probably need to add some parallelism with e.g. Promise.all() or better something like async-pool. Because otherwise every post will wait for every other post which is not what happens usuually

RocketDerp commented 1 year ago

alter system set synchronous_commit=off

Thank you. I'll try that. does it require a run of 'SELECT pg_reload_conf();'? It also looks like you can set that per-session or per ROLE/USER instead of heavy system-wide....

I even tried turning off WAL system-wide on PG (minimum). Ultimately I discovered... the overhead has everything to do with the TRIGGER logic to count aggregates and since I posted that comment about 60K posts and 314,913 comments... I have since rewritten the TRIGGER logic and the improvement is huge. I can now insert 500,000 posts in seconds instead of hours. Backup and restore is amazingly fast compared to even INSERT of 10,000 posts or comments - so that was my big clue and one thing we can do that would probably be a big time saver is share a pg_dump file - although every schema change kind of makes that a bit fragile.

Anyway, more significant than sharing backup files, I nailed it just yesterday... pg_stat_statements is really great for viewing a bulk INSERT while it is running, even if you cancel it and it rolls back... you can still see all the TRIGGER statement activity and row counts update in real-time. DISCLAIMER: I haven't validated that this logic is actually correct yet, that the counts come out right, and I haven't done for comment yet - only for post:

/*
   mass_insert_before0.sql
   run BEFORE maass INSERT on an isolated system.

   Perhaps create a special PostgreSQL ROLE/USER and lock out others during this, or set others to read-only. -or- do the "right thing" and enhance the FUNCTION logic to match the previous per-ROW outcomes in full.

   Background reading:
      https://www.cybertec-postgresql.com/en/why-are-my-postgresql-updates-getting-slower/
     "On my machine, the above script takes 60 seconds, which is a terribly long time to load 100000 rows. If I drop the trigger on item, the same script runs in less than 70 milliseconds."

lemmy schema dump, 0.18.4
TABLE post focus of INSERT:

CREATE TRIGGER post_aggregates_post
  AFTER INSERT OR DELETE
   ON public.post FOR EACH ROW
   EXECUTE FUNCTION public.post_aggregates_post();

CREATE TRIGGER community_aggregates_post_count
  AFTER INSERT OR DELETE OR UPDATE OF removed, deleted
   ON public.post FOR EACH ROW
   EXECUTE FUNCTION public.community_aggregates_post_count();

CREATE TRIGGER site_aggregates_post_insert
  AFTER INSERT OR UPDATE OF removed, deleted
   ON public.post FOR EACH ROW WHEN ((new.local = true))
   EXECUTE FUNCTION public.site_aggregates_post_insert();

CREATE TRIGGER person_aggregates_post_count
  AFTER INSERT OR DELETE OR UPDATE OF removed, deleted
   ON public.post FOR EACH ROW
   EXECUTE FUNCTION public.person_aggregates_post_count();

*/

DROP TRIGGER site_aggregates_post_insert ON public.post;

/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER site_aggregates_post_insert
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION site_aggregates_post_insert();

DROP TRIGGER community_aggregates_post_count ON public.post;

/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER community_aggregates_post_count
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION community_aggregates_post_count();

DROP TRIGGER person_aggregates_post_count ON public.post;

/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER person_aggregates_post_count
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION person_aggregates_post_count();

/*
TRIGGER will be replaced with per-statement INSERT only
no Lemmy-delete or SQL DELETE to be performed during this period.
*/
CREATE OR REPLACE FUNCTION public.site_aggregates_post_insert() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
   UPDATE site_aggregates SET posts = posts +
      (SELECT count(*) FROM new_rows WHERE local = true)
      ;

   RETURN NULL;
END
$$;

CREATE OR REPLACE FUNCTION public.community_aggregates_post_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
        UPDATE
            community_aggregates ca
        SET
            posts = posts + p.new_post_count
        FROM (
            SELECT count(*) AS new_post_count, community_id
            FROM new_rows
            GROUP BY community_id
             ) AS p
        WHERE
            ca.community_id = p.community_id;

    RETURN NULL;
END
$$;

/*
TRIGGER will be replaced with per-statement INSERT only
no Lemmy-delete or SQL DELETE to be performed during this period.
*/
CREATE OR REPLACE FUNCTION public.person_aggregates_post_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
        UPDATE
            person_aggregates personagg
        SET
            post_count = post_count + p.new_post_count
        FROM (
            SELECT count(*) AS new_post_count, creator_id
            FROM new_rows
            GROUP BY creator_id
             ) AS p
        WHERE
            personagg.person_id = p.creator_id;

    RETURN NULL;
END
$$;

P.S, Another observation is that PostgreSQL 15.4 seems to slow down after DROP DATABASE and recreation. I found going to the operating system and deleting the data directory and building data directory with initdb scratch takes only a few seconds and provides more consistent results, although there are pretty large variances in time it takes to re-run of the same code on the same system.

RocketDerp commented 1 year ago

I've now also rewritten the comment table INSERT TRIGGER FUNCTION logic to work per-statement for bulk inserting. https://github.com/LemmyNet/lemmy/compare/main...RocketDerp:lemmy_server_fixes0:content_simulation_bulk_a0

With this logic, I can INSERT over 300,000 comments per minute - most of the time is spent doing CONSTRAINT validation and inserting new records into comment_aggregates shadow table.

RocketDerp commented 1 year ago

ok, pure test system data able to reproduce with auto-explain
taking over 26 full seconds, each refresh white logged-in. anonymous user is instant.
463K posts in community, 6.18M posts in database
auto_explain_list_post_community_0_18_4_ref0.txt

Instead of a single community, here is over 42 seconds for several subscribed communities, with 1,392,245 total posts: auto_explain_list_post_community_0_18_4_subscribed0.txt

RocketDerp commented 1 year ago

For my logged-in logged-out difference for the explains shared in the previous comment, I can eliminate the performance problem by commenting out just one single line of code:

https://github.com/LemmyNet/lemmy/blob/ee7b35a04af4a200e25893249d8a32d4ceeea2b0/crates/db_views/src/post_view.rs#L321

   query = query.filter(community_block::person_id.is_null());

I tried @dullbananas reworked logic on filtering more specifically, it does better, but still executes slowly compared to anonymous user (5 full seconds vs. 20ms)

      query = query.filter(not(exists(
        community_block::table.filter(
          post_aggregates::community_id
            .eq(community_block::community_id)
            .and(community_block::person_id.eq(person_id_join)),
        ),

Attached is an EXPLAIN ANALYZE of that revised code. auto_explain_list_post_community_0_18_4_dullbananas_run0.txt

EDIT: noteworthy that subscribed to 3 large communities does work instantly for https://github.com/LemmyNet/lemmy/pull/3865 but when viewing a single community it goes slow, the attached EXPLAIN.

RocketDerp commented 1 year ago

I've made some progress on the idea of a "hard wall" to contain the routine SELECT statements for browsing Lemmy content. This is focused on scalability, trying to wrangle situations where a server has a huge number of small communities or a growing pile of older posts that aren't the routine reading.

SELECT COUNT(ranked_recency.*) AS post_row_count
FROM
  (
     SELECT id, community_id, published,
        rank() OVER (
           PARTITION BY community_id
           ORDER BY published DESC, id DESC
           )
     FROM post_aggregates) ranked_recency
WHERE rank <= 1000;

Run against 5.4 million post messages spread across 12,000 communities... even without an index on community_id, this query executes in 3 or 4 seconds (many communities containing 6 to 12 posts, a dozen containing 400,000 posts) returning a count of 133037 (the 5.4 million posts are all unique in their published date, spread across 4 month period). Obviously the COUNT(*) can be replaced by SELECT of id or this can be the basis of an UPDATE against a field in post_aggregates.

For almost all the most common sorts: Active, New, Hot, Top 1 Hour, Top 12 Hours.... this gives us a window to age out old posts in a rock-solid WHERE clause that could be used before any JOIN logic. Browsing 20 pages of 50 posts each where I came up with 1000 for any single community.

My idea is to have a new field on post_aggregates to query against pre-JOIN that limits overrun of posts. And keep in mind, that we haven't had anyone demonstrate a super-slow query with non logged-in users, anonymous, because they lack the tricky JOIN logic related to personal choices.

I think a subquery that limits to this inclusion field. say post_aggregates.inclusion is set to 1 on every new INSERT and we use this query as the basis to UNSET it when we want (every few hours, day, whatever).

  1. for All inclusion = 1
  2. list for a single community: inclusion = 1, community_id = 9
  3. for subscribed community: inclusion = 1, community_id IN (SELECT community_id FROM community_follower WHERE person_id = 50)-- I haven't verified the exact query yet

Or... taken a step further, a materialized view or partitioned split of the post_aggregates table be the basis of browsing.

RocketDerp commented 1 year ago

Prototype / proof of concept worked exactly as expected! Even without adding an INDEX, PostgreSQL scanned for inclusion = 1 and cut 5000ms down to under 400ms for a runaway query in a massive community (400K posts).

 ALTER TABLE post_aggregates ADD inclusion smallint DEFAULT 0;

many more details in the attached EXPLAIN ANALYZE run auto_explain_list_post_community_0_18_4_dullbananas_with_inclusion_run0a.txt

     "Filter": "((community_id = 18) AND (inclusion = 1))",
     "Rows Removed by Filter": 1810012,
RocketDerp commented 1 year ago

Study topic for today is join_collapse_limit in PostgreSQL... which has a default of 8.. going back to this EXPLAIN ANALYZE of over 42 seconds....

FROM ((((((((((((("post_aggregates" INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")) INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")) LEFT OUTER JOIN "community_person_ban" ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))) INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")) LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = $1))) LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = $2))) LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = $3))) LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = $4))) LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = $5))) LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = $6))) LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = $7))) LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = $8))) LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = $9))) WHERE

I think we might have something significant! join_collapse_limit. Docs “By default, this variable is set the same as from_collapse_limit". See also: "I just discovered join_collapse_limit has been preventing the PostgreSQL planner from finding a much better join order. In my case, increasing the limit to 10 (from the default of 8) allowed the planner to improve search time from ~30 secs to ~1 ms, which is much more acceptable."

phiresky commented 1 year ago

I asked sunarus to try setting join_collapse_limit a while back and they said it made no difference (except higher planning time).

Would be interesting if you could try my PR https://github.com/LemmyNet/lemmy/pull/3872 with your test server data. (it's kinda similar to your inclusion idea). Copy-pasting the visualization I made:

RocketDerp commented 1 year ago

I spent 4 or 5 hours today analyzing the behavior of hot_rank, and my conclusion is that it decays too quickly and aren't the vote activities going to be changing the values all the time?

We are kind of dancing around that the core of Lemmy's data is all published date centered? Can't pagination be based on a date? This could be taken to even URL parameters and allow somewhat-stable presentation of post listings as they were on a certain past date. While we are on the subject, controversial doesn't secondary sort by published, and I think it should add that for sheer consistency when the values fall off.

Record id are also time sequential, at least in terms of received time. Although I'd like to try and solve why incoming federation is chewing up unused primary keys (sequence) on post and comment tables.

I'm still really liking a focus on community-based LIMIT. over at lemmy.world there are over 9,000 local communities, and I'm sure way more than that with federated ones from lemmy.ml, etc. A vote of +10 on a small community is way more than the popular meme stuff that drowns it all out. The LIMIT 1000 PER community_id really gets at the heart of having a data-set that doesn't grown to be the whole table - and still have a lot to focus on. I'd go so far to say we should develop a sort method named "Interesting" or something that is the default for Lemmy that mutes down over-active popularity (in favor of smaller communities). Reddit had to do this during the 2016 elections as Donald Turmp topics were constantly dominating /r/All and that was when they had to add block-filtering on /r/All too.

P.S. Published date has some general concerns I've identified: 1) kbin and perhaps other apps are sending in future dates in the wild, and I think I've seen a Lemmy server send in hours-off dates. 2) hot_rank treats anything == now or future as 0 result. 3) federation busy or network problems of peer to peer exchange can result in published date being kind of distorted to local users if it's 3 hours ago but the post was only put into the database just now. Some of this could be resolved by adding a new date field that is based on insert (received) time and not just the published on origin-server time. Keeping track of received date would also be useful for server operators wanting to study how stable the Lemmy network is with delivery.

phiresky commented 1 year ago

hot_rank, and my conclusion is that it decays too quickly

Yes, that's why my approach slices through the index in all dimensions it's ordered by e.g. (featured_local, hot_rank, published) (my picture is a bit simplified). I also think it should be changed to a float so ordering information between hot_rank 1 and 0 is not lost. (unrelated to perf though).

Pagination / ordering by date doesn't work because most of the sorts are pretty different from the date. It could work as a heuristic but it's not necessary by doing filtering on the exact compound value needed for the sort per sort.

(1) is maybe fixed with #3496 (2) is intentional, see #3517 (3) i'm not aware of this

RocketDerp commented 1 year ago

Pagination / ordering by date doesn't work because most of the sorts are pretty different from the date. It could work as a heuristic but it's not necessary by doing filtering on the exact compound value needed for the sort per sort.

But that's the heart of the issue I see. Lemmy was built from the ground up and ran online for many years with a tiny number of posts and comments. It's the added data that's created all these problems.

lemmy-ui is rather hostile to search engines, the policy of "nuke everything" (which Reddit did not traditionally do) when a user asks for data to be deleted is also search-engine and stability hostile. (It's taken weeks for people to even notice deleting a comment hides all the reply comments to it) New instances start out with forward-only data, no backfill. I mean Lemmy REALLY favors recent data compared to Reddit.

All the performance and storage issues come along when you have high levels of activity and nobody is reading anything older than 30 days on the high-volume communities. When memes are getting 500 posts a day, you are just focused on "doom scrolling" of NEW NEW NEW. Yet the servers are eating up storage and performance focused on OLD OLD OLD.

I think the Lemmy essence is people refreshing page 1 over and over. Or switching between community A reading page 1, then to community B reading page 1. Because of stability issues, people now have smart-clients logged into 3 instances where they are all viewing page 1 of content from the past 24 hours. We have this massive overhead in place already on the INSERT side with counting 3 times (site, community, person) on every INSERT. If anything, published based recency could be focused there.

RocketDerp commented 1 year ago

I'm not trying to discourage any attempts to address the performance problems, BTW. I will point out some things that could be done: 1) some instructions on how to remove data from post, comment tables into some other place just to mitigate performance (probably take the votes out that go with those). Quick and dirty solution. 2) caching on API level or even lemmy-ui level. Lemmy-ui reloads the same "trending communities" every hit of the home page... caching of any kind to avoid PostgreSQL for even 60 seconds has elapsed would help a lot to get past some bad logic. 3) reverting to non-logged in queries that perform fine - during crisis period. What I've seen is beehaw and lemmy.world entirely shut down their home page in response to overloads.... so that's part of my basis for being willing to accept drastic solutions.

That said... I'm not even sure paging makes sense for Lemmy if we lift limits of 50 posts to allow LIMIT 500 to be loaded at once, then let clients page through that. But there is still a lot of JSON structure overhead, so many fields and objects duplicated or unneeded. Perhaps I got bitten by the websocket design in the 0.17.4 days and I think we should abandon page length thinking in terms of "give me what is new on published and edit/updated since my last request" on API clients. A big fat Unix timestamp/Epoch integer passed as a hint of what updated/published I want for this API request reference.

P.S. I'd really appreciate some help to get the Rust logic down to bare-bones SQL JOIN for anonymous users: https://github.com/LemmyNet/lemmy/pull/3865#issuecomment-1683324467 - I think this puts into the spotlight the JSON overhead of results on a POST list, not just the JOIN SQL statement bloat. I can't imagine I'm the only one who has contemplated bypassing Diesel (or Rust) entirely, even PostgreSQL can generate JSON directly, to make client responses for the frequent-hit API calls (list posts, load comments for a post).

RocketDerp commented 1 year ago

I'm studying: LEFT OUTER JOIN "community_person_ban" ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))) .. and the SELECT field has ("community_person_ban"."id" IS NOT NULL)

I notice the very first comment in this here issue we are reading has:

LEFT OUTER JOIN "community_person_ban" ON ((("post"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post"."creator_id")) AND (("community_person_ban"."expires" IS NULL) OR ("community_person_ban"."expires" > CURRENT_TIMESTAMP))))

Digging through source code changes since May, it was revised on April 25.. Studying to see how it's actually used now.

My concern is that a community with problems has a lot of people in their ban list and this SELECT query is having to do community-specific ban. But the whole idea seems odd to me. Do you really want a feature where banning a user from a community makes all their prior posts disappear? While they are banned, they can't create new posts, right? Reddit doesn't automatically remove older post from a subreddit banned user, I checked..

If you were going to keep such a feature, I'd move it to a background job to filter out the posts of a specific user banned from a specific community... and not burden every post listing with this kind of JOIN logic. Which I also start to contemplate all the situations a post can be removed (moderator) deleted (end user) and the heavy logic the SELECT is using - a unified smallint field on post_aggregates indicating combinations of binary values: 0 = normal visible, 1 = deleted, 2 = removed by mod, 4 = community banned post, 8 = archived / outdated, not used in current-time lists. Taken a step further, this unified smallint could use negative values for super-visibility, community-featured = -1, site-featured = -2. The TRIGGER logic for featured and deleted/removed are already there, these values can be set instantly - and this SELECT can focus on normal-visibility or otherwise. Featured can also be used to manipulate dates and ranks, say place featured posts 10 years into the future so they come out on top... and manipulate featured post hot_rank so they come out first. All in the name of stripping down these SELECT statement logic branches.

RocketDerp commented 1 year ago

Here is my hand-edited version of a lemmy_server Rust generated SELECT statement (query) to view a single user's profile... which is taking 8 full seconds to execute against my test system to list 10 posts. Doing hand analysis...

SELECT 
   "post"."id" AS post_id, "post"."name" AS post_title,
   -- "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url",
   -- "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local",
     "person"."id" AS p_id, "person"."name",
     -- "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated",
     -- "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin",
     -- "person"."bot_account", "person"."ban_expires",
     "person"."instance_id" AS p_inst,
   "community"."id" AS c_id, "community"."name" AS community_name,
   -- "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted",
   -- "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner",
   -- "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods",
   "community"."instance_id" AS c_inst,
   -- "community"."moderators_url", "community"."featured_url",
     ("community_person_ban"."id" IS NOT NULL) AS ban,
   -- "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published",
   -- "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local",
   --"post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank",
   --  "community_follower"."pending",
   ("post_saved"."id" IS NOT NULL) AS save,
   ("post_read"."id" IS NOT NULL) AS read,
   ("person_block"."id" IS NOT NULL) as block,
   "post_like"."score",
   coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") AS unread

FROM (
   ((((((((((
   (
       (
       "post_aggregates" 
       INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
       )
   INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
   LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
   )
WHERE (((((((
  ((("community"."deleted" = false) AND ("post"."deleted" = false)) AND ("community"."removed" = false))
  AND ("post"."removed" = false)) AND ("post_aggregates"."creator_id" = 3)) AND ("post"."nsfw" = false))
  AND ("community"."nsfw" = false)) AND ("local_user_language"."language_id" IS NOT NULL))
  AND ("community_block"."person_id" IS NULL))
  AND ("person_block"."person_id" IS NULL))
ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;

The person id is 3 in this example, you can likely do an easy search and replace on the string for "3)" with another id number. Heads up that there is a "local_user_id" = 3)" in there, and the id for person doesn't always match the local_user_id number. It generates a basic post listing:

 post_id |                    post_title                     | p_id | name | p_inst | c_id  |  community_name  | c_inst | ban | save | read | block | score | unread 
---------+---------------------------------------------------+------+------+--------+-------+------------------+--------+-----+------+------+-------+-------+--------
 5431043 | Dublin community - Featured #1                    |    3 | HCE  |      1 |     4 | Dublin           |      1 | f   | f    | t    | f     |     1 |      0
 5431042 | zy_music - Featured #1                            |    3 | HCE  |      1 |    20 | zy_music         |      1 | f   | f    | t    | f     |     1 |      0
 5431041 | This week's list                                  |    3 | HCE  |      1 | 12032 | zz_multipass1    |      1 | f   | f    | t    | f     |     1 |      0
 5431040 | post list 000                                     |    3 | HCE  |      1 | 12031 | zz_multipass0    |      1 | f   | f    | t    | f     |     1 |      0
 5431039 | Down at the pub, fresh rank! 22 minutes later     |    3 | HCE  |      1 |     3 | pub              |      1 | f   | f    | t    | f     |     1 |      0
 5431038 | Down at the pub, fresh rank!                      |    3 | HCE  |      1 |     3 | pub              |      1 | f   | f    | t    | f     |     1 |      0
 5431037 | Test Post in Extra Community 0                    |    3 | HCE  |      1 | 12029 | extra_community0 |      1 | f   | f    | t    | f     |     1 |      0
 5363344 | ZipGen Stress-Test Community post AAAA0000 p60308 |    3 | HCE  |      1 |    18 | zy_Ireland       |      1 | f   | f    | f    | f     |       |      0
 5356450 | ZipGen Stress-Test Community post AAAA0000 p53414 |    3 | HCE  |      1 |    21 | zy_photography   |      1 | f   | f    | f    | f     |       |      0
 5335835 | ZipGen Stress-Test Community post AAAA0000 p32799 |    3 | HCE  |      1 |    20 | zy_music         |      1 | f   | f    | f    | f     |       |      0
(10 rows)
RocketDerp commented 1 year ago

June 4:

joins are better than in queries with potentially thousands of inserted IDs.

Given that more than 8 JOIN statements is something PostgreSQL specifically concerns itself with (join_collapse_limit). I hand-edit the query with a single IN clause and the performance problem disappears. 8 full seconds becomes less than 200ms against 5,431,043 posts. And that 200ms is still high, as I was extremely over-reaching with "LIMIT 1000" in case the end-user went wild with blocking lists or some other filtering before reaching the final "LIMIT 10". When I change it to "LIMIT 20" in the subquery, it drops almost in half to 115ms... still meeting the needs of the outer "LIMIT 10" by double. More of the core query filtering can be put into the IN subquery, as we aren't dealing with more than 500 length pages (currently limited to 50).

SELECT 
   "post"."id" AS post_id, "post"."name" AS post_title,
   -- "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url",
   -- "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local",
     "person"."id" AS p_id, "person"."name",
     -- "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated",
     -- "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin",
     -- "person"."bot_account", "person"."ban_expires",
     "person"."instance_id" AS p_inst,
   "community"."id" AS c_id, "community"."name" AS community_name,
   -- "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted",
   -- "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner",
   -- "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods",
   "community"."instance_id" AS c_inst,
   -- "community"."moderators_url", "community"."featured_url",
     ("community_person_ban"."id" IS NOT NULL) AS ban,
   -- "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published",
   -- "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local",
   --"post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank",
   --  "community_follower"."pending",
   ("post_saved"."id" IS NOT NULL) AS save,
   ("post_read"."id" IS NOT NULL) AS read,
   ("person_block"."id" IS NOT NULL) as block,
   "post_like"."score",
   coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") AS unread

FROM (
   ((((((((((
   (
       (
       "post_aggregates" 
       INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
       )
   INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
   LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
   )
WHERE 
  post_aggregates.id IN (
     SELECT id FROM post_aggregates
     WHERE "post_aggregates"."creator_id" = 3
     ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
     LIMIT 1000
  )
  AND
  (((((((
  (
  (("community"."deleted" = false) AND ("post"."deleted" = false))
  AND ("community"."removed" = false))
  AND ("post"."removed" = false)
  )
  AND ("post_aggregates"."creator_id" = 3)
  )
  AND ("post"."nsfw" = false))
  AND ("community"."nsfw" = false)
  )
  AND ("local_user_language"."language_id" IS NOT NULL)
  )
  AND ("community_block"."person_id" IS NULL)
  )
  AND ("person_block"."person_id" IS NULL)
  )
ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;
RocketDerp commented 1 year ago

Given this is a profile view, filtering on a single person, I did try to hand-optimize it right on the JOIN without using an IN...AND "post_aggregates"."creator_id" = 3:

       (
       "post_aggregates" 
       INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id" AND "post_aggregates"."creator_id" = 3)
       )

It still took 8 full seconds... it is only when I added back the IN clause that the query optimizer decided to focus on the "LIMIT 1000" before doing all those other JOIN operations.

RocketDerp commented 1 year ago

Another hand-optimization attempt.... the community JOIN clauses vs. primary SELECT WHERE clauses. This yielded no improvement, still 8 full seconds. I still recommend that the Rust code be refactored to build the queries more like this, as the way all the parenthesis are generated to group things together from different tables is pretty confusing. community NSFW grouped with post NSFW really isn't how the underlying database is structured.

FROM (
   ((((((((((
   (
       (
       "post_aggregates" 
       INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id" AND "post_aggregates"."creator_id" = 3)
       )
     INNER JOIN "community" ON 
       ("post_aggregates"."community_id" = "community"."id"
          AND ("community"."nsfw" = false)
          AND ("community"."deleted" = false)
          AND ("community"."removed" = false)
       )
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
   LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
   )
WHERE 
  (((((((
  (
  (
  "post"."deleted" = false)
  )
  AND ("post"."removed" = false)
  )
  AND ("post_aggregates"."creator_id" = 3)
  )
  AND ("post"."nsfw" = false))
  )
  AND ("local_user_language"."language_id" IS NOT NULL)
  )
  AND ("community_block"."person_id" IS NULL)
  )
  AND ("person_block"."person_id" IS NULL)
  )
RocketDerp commented 1 year ago

Hand-optimization taken to an extreme, putting all things that go together... together.... yields an incredibly fast query! now under 10ms.

FROM (
   ((((((((((
   (
       (
       "post_aggregates" 
       INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id" AND "post_aggregates"."creator_id" = 3)
       )
     INNER JOIN "community" ON 
       ("post_aggregates"."community_id" = "community"."id"
          AND ("community"."nsfw" = false)
          AND ("community"."deleted" = false)
          AND ("community"."removed" = false)
       )
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON (
       "post_aggregates"."post_id" = "post"."id"
         AND ("post"."deleted" = false)
         AND ("post"."removed" = false)
         AND ("post"."nsfw" = false)
       )
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON
      (
         ("post_aggregates"."creator_id" = "person_block"."target_id")
         AND ("person_block"."person_id" = 3)
         AND ("person_block"."person_id" IS NULL)
      )
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON 
     (
        ("post_aggregates"."post_id" = "person_post_aggregates"."post_id") 
        AND ("person_post_aggregates"."person_id" = 3)
     )
   )
   LEFT OUTER JOIN "community_block" ON
     (
        ("post_aggregates"."community_id" = "community_block"."community_id")
        AND ("community_block"."person_id" = 3)
        AND ("community_block"."person_id" IS NULL)
     )
   )
   LEFT OUTER JOIN "local_user_language" ON
     (
         ("post"."language_id" = "local_user_language"."language_id")
         AND ("local_user_language"."local_user_id" = 3)
         AND ("local_user_language"."language_id" IS NOT NULL)
     )
   )
WHERE 
  ("post_aggregates"."creator_id" = 3)

ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;

I'm uncertain if this is even an equivalent query. As this query has always been difficult for me to hand-interpret given on JOIN it says creator_id=3 then on the WHERE clause it says NOT NULL. I can keep experimenting to actually learn how these statements work, but I find the whole AND ("community_block"."person_id" IS NULL) to be confusing AF - and not something I have seen in hand-written SQL before... it reeks of machine-generated.

Ok, so this query probably does not accomplish what it should. And the "NOT NULL" and "IS NULL" logic could go on the SELECT field like it is for ("post_saved"."id" IS NOT NULL)? So let's undo this...

FROM (
   ((((((((((
   (
       (
       "post_aggregates" 
       INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id" AND "post_aggregates"."creator_id" = 3)
       )
     INNER JOIN "community" ON 
       ("post_aggregates"."community_id" = "community"."id"
          AND ("community"."nsfw" = false)
          AND ("community"."deleted" = false)
          AND ("community"."removed" = false)
       )
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON (
       "post_aggregates"."post_id" = "post"."id"
         AND ("post"."deleted" = false)
         AND ("post"."removed" = false)
         AND ("post"."nsfw" = false)
       )
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON
      (
         ("post_aggregates"."creator_id" = "person_block"."target_id")
         AND ("person_block"."person_id" = 3)
         --AND ("person_block"."person_id" IS NULL)
      )
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON 
     (
        ("post_aggregates"."post_id" = "person_post_aggregates"."post_id") 
        AND ("person_post_aggregates"."person_id" = 3)
     )
   )
   LEFT OUTER JOIN "community_block" ON
     (
        ("post_aggregates"."community_id" = "community_block"."community_id")
        AND ("community_block"."person_id" = 3)
        --AND ("community_block"."person_id" IS NULL)
     )
   )
   LEFT OUTER JOIN "local_user_language" ON
     (
         ("post"."language_id" = "local_user_language"."language_id")
         AND ("local_user_language"."local_user_id" = 3)
         --AND ("local_user_language"."language_id" IS NOT NULL)
     )
   )
WHERE 
  ("post_aggregates"."creator_id" = 3)
  AND ("local_user_language"."language_id" IS NOT NULL)
  AND ("community_block"."person_id" IS NULL)
  -- AND ("person_block"."person_id" IS NULL)

ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;

Notice the WHERE clause has person_block commented out, because that one extra AND clause causes the whole query to jump from 130ms back to 8 full seconds. And my person_block table is EMPTY. Just for the sake of confirming that wasn't the issue, I did go block one user and even with 1 row it behaves the same (8 full seconds).

next is the dreaded task of trying to unravel the logic of all the LEFT OUTER JOIN vs. INNER JOIN behaviors and choices.

RocketDerp commented 1 year ago

Back in the decades of far slower hardware... this is how I wrote defensive SQL statements, focused on basically loading a single user's profile data:

WHERE 
  ("post_aggregates"."creator_id" = 3)
  AND post.language_id IN (SELECT "local_user_language"."language_id" FROM local_user_language WHERE "local_user_language"."local_user_id" = 3)
  AND post_aggregates.community_id NOT IN (SELECT community_id FROM community_block WHERE "community_block"."person_id" = 3)
  AND post_aggregates.creator_id NOT IN (SELECT target_id FROM person_block WHERE "person_block"."person_id" = 3)

which is executing in 150 ms, where the lemmy logic is taking over 8 full seconds:

WHERE 
  ("post_aggregates"."creator_id" = 3)
  AND ("local_user_language"."language_id" IS NOT NULL)
  AND ("community_block"."person_id" IS NULL)
  AND ("person_block"."person_id" IS NULL)

This is still not very good, because there is no WHERE clause filtering the LIMIT focus... and my community_block and person_block tables are empty, which would not be the case on lemmy.world level of data. But it is far more defensive design, even if slower, because it isolates the components of a person's choices in a more procedural top-down way of thinking. And it also puts focus on some optimizations to Lemmy in general, such as inverting language logic and not dumping 184 rows in for every new user by default, community creation does the same. I opened https://github.com/LemmyNet/lemmy/issues/3891 to be more topic-specific just now.

dullbananas commented 1 year ago

I find the whole AND ("community_block"."person_id" IS NULL) to be confusing AF - and not something I have seen in hand-written SQL before... it reeks of machine-generated.

This checks if a matching community_block row was found in the left join. Using any other non-null column or .* would have the same effect.

I have seen things like community_block::person_id.is_null() in the diesel queries. It would be more clear to use ::star. But this problem will go away when left joins with null checks are replaced with exists, like in #3865.

RocketDerp commented 1 year ago

This checks if a matching community_block row was found in the left join. Using any other non-null column or .* would have the same effect.

Which could very well be the heart of the problem. That last query that the previous 6 or so comments were about was a user-profile view, a single person = 3 for posts.

My most recent discovery for a logged-in user is this: 1) listing a single user's profile takes 4+ full seconds, 2) listing a single community takes 4+ seconds. 3) listing subscribed is instant! Even if that subscribed includes the single community that took 4+ seconds to list by itself.

Then I found a way to make it go into 4 FULL MINUTES, merely by taking the subscribed query (which was instant) and adding a person = 3 clause to it, that really sent PostgreSQL off into what must have been scanning all 5 million posts! On all these queries, there is no cache warm up, they are consistently slow no matter if re-run twice in a row on an idle system (no other activity between repeats). AND ("post_aggregates"."creator_id" = 3) is all it took, 12ms vs 4 full minutes!

RocketDerp commented 1 year ago

And another thing... for a subscribed post-listing for a logged-in user, all these parentheses making JOIN come together, really take a look at it:

FROM 
(
   (((((((
   (
     (
       (
         (
           ( "post_aggregates" 
              INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
              )
           INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
         )
         LEFT OUTER JOIN "community_person_ban" ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
       )
       INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
     )
     LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
)

This LEFT OUTER JOIN on community_person_ban and the whole sequence of all the JOIN operations. I really don't know what PostgreSQL is doing with all these parentheses, if it overrides it. Since we are exceeding the 8 JOIN concern of PostgreSQL, I suspect that order may change the performance... but I haven't experimented with hand-editing the order of table JOINs.

RocketDerp commented 1 year ago

some Rust code experiments... I managed to divorce the Read and List function marriage and get a List I could make options come in from the top of the function. I was able to get Diesel to build a sub query, which was pretty easy, and I experimented with moving the creator_id + community_id into the subquery, before LIMIT and OFFSET. Worked well. Was able to add sorting twice, to both the subquery and the outer query.... not difficult.

At this point I need to generate more diverse testing data for blocking, NSFW, etc

RocketDerp commented 1 year ago

Earlier in this comment chain, I said that I felt we were dancing around published date as a cornerstone. And I still feel that the 'executive decision' needs to be made by the project to limit Active, Hot sorting to a default date cutoff. I'm thinking either 7 or 10 days; I have submitted foundation API parameters with a code pull request and opened a new issue to focus on that topic. See: https://github.com/LemmyNet/lemmy/issues/3899