Investigate performance impact of large number of groups

BitForger commented 6 years ago

Issues on Github are meant for bug reporting. Please post feature requests on the discussion forum.

Try to complete the below form as far as you are able and are willing to share. Add a screenshot of the issue if you can.

Explanation

Flarum has been chosen to be the adopted base software by our client for a modified forum-esque platform. As we were hacking our way through the system to make it work the way we needed it, we built it into a docker container. After much frustration with load times taking immensely long amounts of time we started doing some monitoring and logging of operations. We found that for a single page load ~1600 operations to check permissions happen. This is just one aspect of the application and even running on a task given a full vCPU core and a gigabyte of RAM it still takes about 10 seconds to load.

Are there ways to increase performance on this? I think the ideal fix would be to change the way permissions are checked but I think it may be because our use case has strayed a little from the intended platform that this issue is occurring. Would I be correct in stating that the goal of flarum is to have the app load all the data from the 'api' php backend once and then not have to do it again unless absolutely necessary?

Technical details

Version of Flarum: 0.1.0-beta.7
Website URL where the bug is visible: n/a
The webserver you are running: apache
PHP version: 7.0
Hosted environment: ECS
Hosting provider: AWS

Flarum info

n/a

Log files

n/a

luceos commented 6 years ago

I've had Flarum running inside docker and a kubernetes cluster (on GKE) without any issue nor excessive resources. You mention "hacking our way through the system to make it work the way we needed it", what have you done exactly?

As Flarum is still in beta there are still quite some improvement necessary, including to the database. A step into the right direction is already taken with the upcoming release though.

franzliedke commented 6 years ago

@BitForger Please describe your use case. We haven't heard of problems like these so far, so we need far more details to be able to help.

tobyzerner commented 6 years ago

@BitForger Also any evidence that the permission-checking is really the thing that's taking up all the time? This code is run many times by design (though ~1600 does seem a bit excessive) but it shouldn't matter because it's efficient code.

Perhaps if you could use a tool like Blackfire.io to profile and then share your results.

BitForger commented 6 years ago

@luceos Most of the work has been implementing a custom auth layer that integrates with the rest of the services we have built. We also do some overriding and disabling of auth layers in flarum. UI changes mostly. We needed a more group centric design so each group would have a discussion that has posts and comments in it.

Now that I mention that, I bet it's because we have a group for basically every base slug...

We went in and edited our maintained version of core to remove permissions checks (since we use our own roles across services) and the page load times dropped to ~5 seconds on our staging ECS instance with much lower resources allocated (~128 vCPU units [1024 vCPU units = 1 vCPU core] & 128 MiB RAM)

@tobscure we logged every time a permission was checked and then counted the lines. It came out to 1679 lines if I recall correct.

If this is because of the sheer number of groups we have then I think it's just a simple use case issue where we've strayed from the platform's intended use.

tobyzerner commented 6 years ago

How many groups do you have?

BitForger commented 6 years ago

@tobscure I'm away from my work computer right now, but I want to say in the ballpark of 800.

franzliedke commented 6 years ago

I am closing this for now.

Feel free to continue discussing this here or in our forums, but right now there is no concrete step that can be taken; therefore, I don't think this should remain open.

@tobscure I faintly remember discussing this topic in an issue maybe a year ago, where we agreed that having that many groups isn't a usecase that we want to support without custom extensions at this time. Do you happen to know where that was?

tobyzerner commented 6 years ago

This is the only one I could find https://github.com/flarum/core/issues/612

I think we do need to investigate the performance impact of having large numbers of groups. That's a concrete next step, so I would be in favour of keeping this open?

franzliedke commented 6 years ago

If somebody writes up where exactly the problem occurs and suggests what could be done... :wink:

BitForger commented 6 years ago

@franzliedke @tobscure gotcha, yeah I'll try and get to it some time this week.

tobyzerner commented 6 years ago

@BitForger any updates?

BitForger commented 6 years ago

@tobscure Not yet. :/ I haven't been able to look at it because of overtime I've had to do to get a project done at work. I'm going to try to get to it today though.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We do this to keep the amount of open issues to a manageable minimum. In any case, thanks for taking an interest in this software and contributing by opening the issue in the first place!

BitForger commented 3 years ago

Oof, three years later... sorry about that.

This is what I recall from memory. There was a point in the code that it would loop through every... I forget the naming scheme for things.. group(?, we had about 800) for every discussion we had (I don't remember how many, but it was probably close to 50 or 100). it was specifically in the flarum/core package, i remember that much. I'm pretty sure it was also in one of the controllers or abstract controllers for listing either discussions or the forum. That's about all I can remember.

luceos commented 3 years ago

I think groups are added to the initial page load, just like tags was. So it makes sense that TTFB is impacted when you have 800 groups. Personally this is another segment of core that needs to be lazy loaded, so only load groups of users directly required on the page opened (actor included). Permission checks are executed on the back end anyhow.

It might also be possible that somewhere in the authorization layer we are doing a loop over all groups to check against. This is a wild guess though and needs investigation.

@BitForger three years and unresolved, so so sorry about that! Is the community still running? I'd love to hear which one it is 😂

BitForger commented 3 years ago

@BitForger three years and unresolved, so so sorry about that! Is the community still running? I'd love to hear which one it is 😂

Unfortunately, the client decided to stop rolling custom solutions and went to use some platforms they could plug together fairly easily so we shut it down after about 6 months.

flarum / issue-archive