flarum / framework

Simple forum software for building great communities.
http://flarum.org/
6.27k stars 826 forks source link

PostRepository::getIndexForNumber() walks 2 time the posts table to get offset #3895

Open DavidKeller opened 11 months ago

DavidKeller commented 11 months ago

Current Behavior

When showing discussion with a large number of posts, flarum pressures the database when calling

    public function getIndexForNumber(int $discussionId, int $number, ?User $actor = null): int
    {
        if (! ($discussion = Discussion::find($discussionId))) {
            return 0;
        }

        $query = $discussion->posts()
            ->whereVisibleTo($actor)
            ->where('created_at', '<', function ($query) use ($discussionId, $number) {
                $query->select('created_at')
                    ->from('posts')
                    ->where('discussion_id', $discussionId)
                    ->whereNotNull('number')
                    ->take(1)

                    // We don't add $number as a binding because for some
                    // reason doing so makes the bindings go out of order.
                    ->orderByRaw('ABS(CAST(number AS SIGNED) - '.(int) $number.')');
            });

        return $query->count();
    }

with the following SQL request:

SELECT count(*) AS AGGREGATE
FROM `posts`
WHERE `posts`.`discussion_id` = 3096
  AND `posts`.`discussion_id` IS NOT NULL
  AND EXISTS
    (SELECT 1
     FROM `discussions`
     WHERE `discussions`.`id` = `posts`.`discussion_id`
       AND (`discussions`.`id` not in
              (SELECT `discussion_id`
               FROM `discussion_tag`
               WHERE `tag_id` not in
                   (SELECT `tags`.`id`
                    FROM `tags`
                    WHERE (`tags`.`id` in
                             (SELECT `perm_tags`.`id`
                              FROM `tags` AS `perm_tags`)
                           AND (`tags`.`parent_id` in
                                  (SELECT `perm_tags`.`id`
                                   FROM `tags` AS `perm_tags`)
                                OR `tags`.`parent_id` IS NULL)))))
       AND (`discussions`.`is_private` = 0))
  AND (`posts`.`is_private` = 0)
  AND `created_at` <
    (SELECT `created_at`
     FROM `posts`
     WHERE `discussion_id` = 3096
       AND `number` IS NOT NULL
     ORDER BY ABS(CAST(number AS SIGNED) - 31224)
     LIMIT 1)
  AND `type` in ('discussionTagged',
                 'discussionStickied',
                 'discussionLocked',
                 'comment',
                 'discussionRenamed');

which results in the following execution plan on MariaDB:

+------+--------------------+----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------------+-------+-----------------------------+
| id   | select_type        | table          | type            | possible_keys                                                                                                                                        | key                                  | key_len | ref              | rows  | Extra                       |
+------+--------------------+----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------------+-------+-----------------------------+
|    1 | PRIMARY            | discussions    | const           | PRIMARY                                                                                                                                              | PRIMARY                              | 4       | const            | 1     |                             |
|    1 | PRIMARY            | posts          | ref             | posts_discussion_id_number_unique,posts_discussion_id_number_index,posts_discussion_id_created_at_index,posts_type_index,posts_type_created_at_index | posts_discussion_id_number_unique    | 4       | const            | 66770 | Using where                 |
|    7 | SUBQUERY           | posts          | ref             | posts_discussion_id_number_unique,posts_discussion_id_number_index,posts_discussion_id_created_at_index                                              | posts_discussion_id_created_at_index | 4       | const            | 66770 | Using where; Using filesort |
|    3 | DEPENDENT SUBQUERY | discussion_tag | index_subquery  | PRIMARY                                                                                                                                              | PRIMARY                              | 4       | func             | 1     | Using where                 |
|    4 | DEPENDENT SUBQUERY | tags           | eq_ref          | PRIMARY,tags_parent_id_foreign,tags_last_posted_user_id_foreign,tags_last_posted_discussion_id_foreign                                               | PRIMARY                              | 4       | func             | 1     | Using where                 |
|    4 | DEPENDENT SUBQUERY | perm_tags      | eq_ref          | PRIMARY,tags_parent_id_foreign,tags_last_posted_user_id_foreign,tags_last_posted_discussion_id_foreign                                               | PRIMARY                              | 4       | forum_v4.tags.id | 1     |                             |
|    6 | DEPENDENT SUBQUERY | perm_tags      | unique_subquery | PRIMARY                                                                                                                                              | PRIMARY                              | 4       | func             | 1     |                             |
+------+--------------------+----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+---------+------------------+-------+-----------------------------+

As you can see on the rows column from the last snippet: the posts table is scanned two times in order to retrieve only an offset.

Steps to Reproduce

Create a discussion with a large number of posts (> 50K). Display it.

Expected Behavior

Expect the display duration to be agnostic of the number of posts within the discussion.

Screenshots

No response

Environment

Output of php flarum info

Flarum core: 1.8.2
PHP version: 8.1.24
MySQL version: 11.1.2-MariaDB-1:11.1.2+maria~ubu2204-log
Loaded extensions: Core, date, libxml, openssl, pcre, sqlite3, zlib, ctype, curl, dom, fileinfo, filter, ftp, hash, iconv, json, mbstring, SPL, session, PDO, pdo_sqlite, standard, posix, readline, Reflection, Phar, SimpleXML, tokenizer, xml, xmlreader, xmlwriter, mysqlnd, gd, pdo_mysql, sodium, zip, Zend OPcache
+-------------------------------+---------+--------+
| Flarum Extensions             |         |        |
+-------------------------------+---------+--------+
| ID                            | Version | Commit |
+-------------------------------+---------+--------+
| flarum-tags                   | v1.8.0  |        |
| migratetoflarum-old-passwords | 1.0.0   |        |
| fof-nightmode                 | 1.5.3   |        |
| flarum-suspend                | v1.8.1  |        |
| flarum-subscriptions          | v1.8.0  |        |
| flarum-sticky                 | v1.8.0  |        |
| flarum-nicknames              | v1.8.0  |        |
| flarum-mentions               | v1.8.2  |        |
| flarum-markdown               | v1.8.0  |        |
| flarum-lock                   | v1.8.0  |        |
| flarum-likes                  | v1.8.0  |        |
| flarum-lang-english           | v1.8.0  |        |
| flarum-flags                  | v1.8.0  |        |
| flarum-emoji                  | v1.8.0  |        |
| flarum-bbcode                 | v1.8.0  |        |
| asile-ninja                   | 1.0.0   |        |
+-------------------------------+---------+--------+
Base URL: https://dev.lesitedecuisine.fr
Installation path: /var/www/dev.lesitedecuisine.fr
Queue driver: sync
Session driver: file
Mail driver: mailgun
Debug mode: off

Possible Solution

Indexes ?

Additional Context

_No response_je

DavidKeller commented 11 months ago

I wonder why flarum introduced the number column on the post table as it can't be trusted because posts can be deleted and there is already the id/created_at attributes to provide post ordering ?

Spencer-Robertson-ANU commented 11 months ago

Hi there, this looks like something I might be able to fix, could I please be assigned? Thankyou

DavidKeller commented 11 months ago

As a side note, my proposed solution was to add indexes, but I didn't understood at the time the database structure and the execution plan output.

If I understand correctly, Mysql is already using the following two indexes:

to skip unrelated posts, that's why the whole posts table isn't walked. Still, quite a lot of posts are walked.

This function seems to be called in order to retrieve the current offset of unread post within a discussion.

I wonder if it might be preferable to save for each discussion/user combination:

  1. the post's id of the latest seen post (in order to start the fetch cursor from it)
  2. it's associated number (in order to compute unread posts count)

And replace deleted posts with a message flagging the post as such in order to keep the following posts number coherent ? (deletion becomes O(log P) where P is the total posts count)

OR

Each time a post is deleted, for each discussion/user combination, decrease the associated number if deleted post id is >= discussion/user combination associated id (deletion becomes O(C + log P) where C is the number of combination)

DavidKeller commented 8 months ago

@Spencer-Robertson-ANU How is it going ?

luceos commented 6 months ago

A few comments here. From mobile so I might leave out or miss a few details.

First of we don't do assignment of issues unless you are a core developer. You could have provided a proposed solution and a pull request for us to review.

Secondly. Storing the post Id seems like a valid solution, but post Ids aren't sequences. I think introducing this into the last read state of a user would be detrimental and updating all user states upon deletion will be a big hit on performance for communities with millions of users.

Thirdly, thank you for the amazing amount of research on this topic @DavidKeller. I do feel this needs our attention, at least for 2.x.

DavidKeller commented 6 months ago

In our case, we don't have that many users, but big topics where the current design causes delays up to multiple seconds. Because of this, we haven't migrated to flarum yet.

But I get your point.

What about the first proposal, that is not really deleting post, but flagging them as deleted instead in order to keep a valid number ? In a perfect world, all requests against database should be O(Log N) or O(1)