Open DavidKeller opened 11 months ago
I wonder why flarum introduced the number column on the post table as it can't be trusted because posts can be deleted and there is already the id/created_at attributes to provide post ordering ?
Hi there, this looks like something I might be able to fix, could I please be assigned? Thankyou
As a side note, my proposed solution was to add indexes, but I didn't understood at the time the database structure and the execution plan output.
If I understand correctly, Mysql is already using the following two indexes:
posts_discussion_id_number_unique
posts_discussion_id_created_at_index
to skip unrelated posts, that's why the whole posts
table isn't walked.
Still, quite a lot of posts are walked.
This function seems to be called in order to retrieve the current offset of unread post within a discussion.
I wonder if it might be preferable to save for each discussion/user combination:
id
of the latest seen post (in order to start the fetch cursor from it) number
(in order to compute unread posts count) And replace deleted posts with a message flagging the post as such in order to keep the following posts number
coherent ? (deletion becomes O(log P) where P is the total posts count)
OR
Each time a post is deleted, for each discussion/user combination, decrease the associated number
if deleted post id
is >=
discussion/user combination associated id
(deletion becomes O(C + log P) where C is the number of combination)
@Spencer-Robertson-ANU How is it going ?
A few comments here. From mobile so I might leave out or miss a few details.
First of we don't do assignment of issues unless you are a core developer. You could have provided a proposed solution and a pull request for us to review.
Secondly. Storing the post Id seems like a valid solution, but post Ids aren't sequences. I think introducing this into the last read state of a user would be detrimental and updating all user states upon deletion will be a big hit on performance for communities with millions of users.
Thirdly, thank you for the amazing amount of research on this topic @DavidKeller. I do feel this needs our attention, at least for 2.x.
In our case, we don't have that many users, but big topics where the current design causes delays up to multiple seconds. Because of this, we haven't migrated to flarum yet.
But I get your point.
What about the first proposal, that is not really deleting post, but flagging them as deleted instead in order to keep a valid number
?
In a perfect world, all requests against database should be O(Log N) or O(1)
Current Behavior
When showing discussion with a large number of posts, flarum pressures the database when calling
with the following SQL request:
which results in the following execution plan on MariaDB:
As you can see on the
rows
column from the last snippet: the posts table is scanned two times in order to retrieve only an offset.Steps to Reproduce
Create a discussion with a large number of posts (> 50K). Display it.
Expected Behavior
Expect the display duration to be agnostic of the number of posts within the discussion.
Screenshots
No response
Environment
Output of
php flarum info
Possible Solution
Indexes ?
Additional Context
_No response_je