flarum / framework

Simple forum software for building great communities.
http://flarum.org/
6.39k stars 835 forks source link

Tags performance issue Forum::tags #2177

Closed luceos closed 3 years ago

luceos commented 4 years ago

Bug Report

Current Behavior

    public function loadTagsRelationship(WillSerializeData $event)
    {
        // Expose the complete tag list to clients by adding it as a
        // relationship to the /api endpoint. Since the Forum model
        // doesn't actually have a tags relationship, we will manually load and
        // assign the tags data to it using an event listener.
        if ($event->isController(ShowForumController::class)) {
            $event->data['tags'] = Tag::whereVisibleTo($event->actor)
                ->withStateFor($event->actor)
                ->with(['parent', 'lastPostedDiscussion'])
                ->get();
        }
    }

https://github.com/flarum/tags/blob/fa83f496b0004e31ef2f4fc854bfcdf6a4cfa3c3/src/Listener/AddForumTagsRelationship.php#L46-L58

This piece of code is responsible for a performance penalty with forums that have many tags. There are a few things at fault here:

Suggested solution

What needs to be done is:

or:

Environment

askvortsov1 commented 4 years ago

This PR: https://github.com/flarum/tags/pull/84 should already reduce strain substantially. From there:

rafaucau commented 4 years ago

After migrating tags from the old forum to Flarum, I wondered why the forum was loading indefinitely. Well, we have the answer - I have 2000+ tags 😄

I like the first solution:

What needs to be done is:

  • Load only those tags on page load that are actually shown on the sidebar
  • Load the additional tags on request
  • Include the tags in the discussion list request
askvortsov1 commented 4 years ago

@rafaucau we're now investigating another possible cause: by any chance, do you have the fof follow tags extension installed?

rafaucau commented 4 years ago

@askvortsov1 Yes

askvortsov1 commented 4 years ago

Then that's likely it: an issue with FoF follow tags means that multiple SQL queries are made per tag whenever loading a page. I'm gonna close this for now unless someone can reproduce this on an install with only core extensions.

rafaucau commented 4 years ago

@askvortsov1 After disabling FoF follow-tags I don't see any difference. It still loads very slowly

askvortsov1 commented 4 years ago

Nevermind then... Could you open a new issue and share the output of your php flarum infoand link to this one

simon1222 commented 4 years ago

The problem is generated in TagSerializer by:

'canStartDiscussion' => $this->actor->can('startDiscussion', $tag),
'canAddToDiscussion' => $this->actor->can('addToDiscussion', $tag),

and

protected function lastPostedDiscussion($tag)
    {
        return $this->hasOne($tag, DiscussionSerializer::class);
    }
askvortsov1 commented 4 years ago

@simon1222 is this something you've confirmed via testing? We were able to determine that disabling FoF Follow Tags reduces the number of SQL queries sent to the database on initial load from 411 to 54 for a forum with hundreds of tags.

simon1222 commented 4 years ago

@askvortsov1 yes, I've confimed via testing. I have 320k discussions and 2k tags, so lastPostedDiscussion was one of the problem. I was waiting about 60s, so I've commented return $this->hasOne($tag, DiscussionSerializer::class); and reduced to 30s. I was testing line by line and this piece of code is working fine:

$event->data['tags'] = Tag::whereVisibleTo($event->actor)
                ->withStateFor($event->actor)
                ->with(['parent', 'lastPostedDiscussion'])
                ->get();

but methods, which are executed in Serializer, are working too slow. I think these methods generate too many queries to the database.

askvortsov1 commented 4 years ago

hmm, thing is, that should be processed by the includes system, which would load it all in one query. It would be super helpful to get a bit more information as to what is going on with your forum, so we can try to fix some of these bugs and increase performance.

Would you be able to use https://github.com/FriendsOfFlarum/clockwork (if it works) or some alternate tool to see how many and which database requests are being made on forum load? Also, could you share the URL of your forum and the output of php flarum info? If this is something you'd prefer not to post publically, could you send your https://discuss.flarum.org username and I can reach out to you via private message?

w-4 commented 4 years ago

I think big performance gains will be achieved by not loading the lastPostedDiscussion relationship on normal pages. I might be wrong, but I think it's only used on the tags overview page. Here's the relevant code:

    /**
     * @param WillGetData $event
     */
    public function includeTagsRelationship(WillGetData $event)
    {
        if ($event->isController(ShowForumController::class)) {
            $event->addInclude(['tags', 'tags.lastPostedDiscussion', 'tags.parent']);
        }
    }

That would remove two SQL queries per Tag:

SELECT * FROM `discussions` WHERE `discussions`.`id` = '1' LIMIT  | runtime: 0.27 ms
SELECT `tags`.*, `discussion_tag`.`discussion_id` as `pivot_discussion_id`, `discussion_tag`.`tag_id` as `pivot_tag_id` FROM `tags` inner join `discussion_tag` on `tags`.`id` = `discussion_tag`.`tag_id` WHERE `discussion_tag`.`discussion_id` = '1' | runtime: 0.43 ms

And it would also remove the size of the payload JSON significantly. Right now every last posted discussion object is included in the resource section.

dsevillamartin commented 4 years ago

@w-4 That's a good point. The last posted discussion can be loaded through an HTTP request when one clicks the tags page and/or included only in the /tags route.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We do this to keep the amount of open issues to a manageable minimum. In any case, thanks for taking an interest in this software and contributing by opening the issue in the first place!

SychO9 commented 3 years ago

Sharing what I recently found,

So I decided to update fof/clockwork to see flarum's performance and have a better idea of it. That's when I came across the fact that there were two types of models being loaded for every single primary tag (UserState, Tag) which are relations of the tag's lastPostedDiscussion discussion model, they are called and used in the DiscussionSerializer which results in querying them.

Simply eager loading these relationships solves the issue by only using two queries instead of two queries per primary tag. Do we want to make this change or instead just do as posted above and query the lastPostedDiscussions from the frontend while also limiting the number of tags exposed ?

We could include this fix for now since it doesn't hurt. (I posted the PR incase we'd like to do that).

$data['tags'] = Tag::whereVisibleTo($actor)
            ->withStateFor($actor)
            ->with([
                'parent',
                'lastPostedDiscussion',
                'lastPostedDiscussion.tags',
                'lastPostedDiscussion.state'
            ])
            ->get();

Also observation here, we don't actually need to eager load the tags here since we are loading all of them, we could just loop through all the lastPostedDiscussion relations and set the tags relationship from the collection of tags we already have.

Screenshot from 2021-02-24 18-51-11 Screenshot from 2021-02-24 18-50-46 Screenshot from 2021-02-24 18-48-55

askvortsov1 commented 3 years ago

I'd be in favor of including this since those queries are needed anyway, but I wouldn't close this issue until we properly paginate it and only load the tags we need, when we need them.

askvortsov1 commented 3 years ago

Infinite nesting PR might have a start to this

askvortsov1 commented 3 years ago

Plan for handling this is to do it in 3 stages.

Stage 1: only load top-level primary tags, top 3 secondary tags, and current tag's children/siblings on an arbitrary forum request. This solves the issue of thousands of tags being loaded in at once, which understandably makes things very slow. This is ready for review at https://github.com/flarum/tags/pull/87

Stage 2: Optimize tag permission checking so we don't have to send huge arrays to/from the database. This significantly speeds up all pages (2x at least), and when coupled with Stage 1, makes the effect of tags negligible for every page except the "Tags" page. This is ready for review at https://github.com/flarum/tags/pull/126

Stage 3: Pagination. I propose the following changes:

Stages 1 and 2 make tags usable on an enterprise-scale forum. Stage 3 enables hundreds of primary and thousands of secondary tags with elegant scaling. Stage 3 will likely be after stable.