gjbarnard / moodle-format_topcoll

Collapsed Topics course format for Moodle.
http://docs.moodle.org/en/Collapsed_Topics_course_format
GNU General Public License v3.0
35 stars 60 forks source link

Severe performance issues #114

Open aspark21 opened 3 years ago

aspark21 commented 3 years ago

Hi Gareth,

This capability checks is extremely problematic - it's occurring 100s of times per page load and taking 120s for that to load https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/activity.php#L129-L173

In particular this method is probable issue: private static function calulatecoursemodules($courseid, $students, $modid = null) {

especially this Line 757 - if (((method_exists($usermod, 'is_visible_on_course_page')) && ($usermod->is_visible_on_course_page()))

We're in CI mode as whole site went down earlier today, it seems other universities are also affected with performance issues on earlier versions of topcoll. We've had a long running item around coursemodinfo cache being problematic for some users for the last year or so. And had been pursuing https://tracker.moodle.org/browse/MDL-55231 - which is not quite ready but close. Though there's thinking needed about wether there is something about topcoll (or topcoll + adaptable) which could make this worse.

We should have Cat AU looped in overnight who may have some recommendations.

Al P.S. will email you to bring you into the loop on this

aspark21 commented 3 years ago

Actually there's no data in this so posting the xhprof traces here: course-view

course-editbulkcompletion

gjb2048 commented 3 years ago

Dear @aspark21,

I have just about had enough of this code (sick to the back teeth) that I've inherited from Adaptable and indeed within Adaptable itself. If its problematic, then please do turn it off.

The code in question, only follows what core wrote. Before that the answer was inaccurate and confusing. I see no other way of writing it (bar removing the method exists call).

Also, please could you test the M3.11 version (https://moodle.org/mod/forum/discuss.php?d=425903) as that now employs the use of the Moodle cache, and should be much faster after initial calculations.

Gareth

aspark21 commented 3 years ago

How do we turn it off?

we're on cd442546e241b344daf47047ba975e9ef8ad19ca which includes all the cache improvements of the 3.9.1.6 release on Moodle 3.9.10

gjb2048 commented 3 years ago

Settings can be used: https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/settings.php#L505-L589 - but that would mean resetting the default layout, probably needs rethinking that bit as a separation / better switch. So... comment out the code:

https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/course_renderer.php#L220-L234

and the events:

https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/observer.php

which could be better and react to the settings. EDIT, does react to the settings: https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/classes/activity.php#L1230-L1243, so just need a better 'reset' to default for that setting as a single element / have the course setting follow the default as the other's now do. EDIT: Which it does https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/lib.php#L774-L791, so now only need to separate into its own reset category so that courses not following the default can be reset to do so. i.e. when its changed to off.

gjb2048 commented 3 years ago

Why does applying activity completion trigger a 'course_module_updated' event? I would have thought that would only happen when the teacher actually changed it.

gjb2048 commented 3 years ago

Ok, improvements:

Separated out the reset so that its separate and easier to disable on a / all courses: https://github.com/gjb2048/moodle-format_topcoll/commit/a9fa4de5ffb178bc46100629f0b20822095b7c0b (Don't apply this: https://github.com/gjb2048/moodle-format_topcoll/commit/a9fa4de5ffb178bc46100629f0b20822095b7c0b#diff-8f5ff72ef1cf59199cfc4bbf16f2475d09e448601242f0d2c51bb94bd4aa1395R1572).

Removed 'method_exists': https://github.com/gjb2048/moodle-format_topcoll/commit/8a0dfea27dc9fc57afa47d60c113db5e381ff24d

Only calculate the students once per course: https://github.com/gjb2048/moodle-format_topcoll/commit/9a1afdcaf0f49bf4d9b3ea22dfdb78b6df419ccc.

gjb2048 commented 3 years ago

Small optimisation: https://github.com/gjb2048/moodle-format_topcoll/commit/59aaff91ab2ae00767cdfd07c93a107483071ede

gjb2048 commented 3 years ago

@aspark21 With the second flow, why is '\core\event\course_module_completion_updated' not used instead?

gjb2048 commented 3 years ago

@aspark21 Ok, would a setting that prevented the activity information being calculated / shown if there was more than 'x' students on the course help?

aspark21 commented 3 years ago

We turned off those settings yesterday morning and site performed ok. Did not comment out any code yet.

We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features.

The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day).

gjb2048 commented 3 years ago

What events are causing those invalidations and what is generating them?

gjb2048 commented 3 years ago

I need stack traces please.

gjb2048 commented 3 years ago

@aspark21 Any news on this please? At the moment I'm a bit in the dark, so can only operate a little on guesswork, I need more concrete facts / evidence.

aspark21 commented 3 years ago

Hi Gareth,

Yes, the team will provide the additional info shortly

gjb2048 commented 3 years ago

Hi Alistair,

Thanks. The UCL team?

Gareth

gjb2048 commented 3 years ago

@aspark21 Please note that the MOODLE_311 branch is being updated / improved in this area.

gjb2048 commented 3 years ago

@aspark21 Been 7 days, any news on this please? I feel like a complete mushroom at the moment and working on pure guess work. I really need the events / stack traces for:

We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features.

The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day).

Then I might have a hope at fixing this for you!

aspark21 commented 3 years ago

Hi Gareth,

Sorry for the slow replies, been pretty tied up in the incident and looking for other causes.

We found the root cause for our incident to actually be https://tracker.moodle.org/browse/MDL-72837 - so I assume

I will rerun some stack traces this week now we've fixed the underlying flaw to see how this code is now performing.

Al

gjb2048 commented 3 years ago

Dear @aspark21,

Thank you for getting back to me. I've now improved the code with a new admin setting 'courseadditionalmoddatamaxstudents' (you'll find this on the MOODLE_311 and master (for M4.0) branches), whereby if not zero, then additional information will be turned off if the number of students is above the number set. Of course, the number of students is still calculated, but there should be a performance increase as the rest isn't. The course settings will inform an editing teacher of the situation when this happens.

This is on top of all the other settings to turn elements on / off for this.

G

gjb2048 commented 3 years ago

Dear @aspark21,

Any news on this please as its driving me nuts that I cannot make progress on this. I'm have a 'get the job done' attitude and this is stalled, which is very frustrating. If there is a bug to fix, then it needs fixing as I'll only then burn more duplicate time making more releases as other things make progress and this doesn't.

Gareth

aspark21 commented 3 years ago

Hi Gareth,

We won't be able to feedback on your improvements, as you only made the change in MOODLE_311 and not in MOODLE_39.

I've go an xhprof here of course/view.php now that we've fixed our underlying infrastructure issue. And you can see this is performing adequately. This is with the Additional Info switched off at Site level. The slow thing is calendar / course overview related, so nothing collapsed topics related & overall page load is not that slow. callgraph-courseview-fixed-noadditional

I have re-enabled the Additional Info at site level just now, so will provide similar trace towards the end of the day, for comparison.

gjb2048 commented 3 years ago

Dear Alistair,

Thanks for the information. To be honest, I'm really embarrassed by all of this and had not anticipated the agro that porting this code from Adaptable would bring - in fact I'm very annoyed about it all. It is my intention to improve the 'kill switch' code and port that element along with the caching mechanism as far back as I can.

Gareth

gjb2048 commented 3 years ago

@aspark21 But I have added the caching to the M3.9 version and released it! Please see point '2' for 'Version 3.9.1.6' on https://github.com/gjb2048/moodle-format_topcoll/blob/MOODLE_39/Changes.md.

aspark21 commented 3 years ago

That wasn't a change for us - we were on MOODLE_39 @ cd442546e241b344daf47047ba975e9ef8ad19ca which already had that change in. We've just updated to MOODLE_39 @ 29450650698b49a2f748c24c9d06bd0a3fa1a2c7 which only had trivial changes (#105 and #111)

I was referring to the new 'courseadditionalmoddatamaxstudents' setting you mentionned https://github.com/gjb2048/moodle-format_topcoll/issues/114#issuecomment-950880746

gjb2048 commented 3 years ago

@aspark21 Ah, ok, but what about this:

You:

"We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features."

"The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day)."

Me:

"Why does applying activity completion trigger a 'course_module_updated' event? I would have thought that would only happen when the teacher actually changed it."

"With the second flow, why is '\core\event\course_module_completion_updated' not used instead?"

"What events are causing those invalidations and what is generating them?"

You:

"Yes, the team will provide the additional info shortly"

Thus the 'courseadditionalmoddatamaxstudents' is outside of the event driven cache code that was already in the M3.9 version that I thought you were testing and I could possibly identify if I needed to raise a tracker issue for the 'completion' when it updates and generates the wrong event. The 'courseadditionalmoddatamaxstudents' setting only mitigates the issue but won't identify the "The key here is the cache is invalidated very frequently(100s of times/hour), not just for that specific course but for all courses and the cache then needs to get rebuilt. It's disproportional to the amounts of actual course edits(90 in a day)." issue I thought you were getting for me, please :).

aspark21 commented 3 years ago

For "We're however still seeing a significant amount of cache invalidations so it seems there is still something touching the caches outside of those features."

The reason for that was https://tracker.moodle.org/browse/MDL-72837 - An infrastructure & cache config combo that had a bug in core. So not Collapsed Topics related. The human way I explain this bug is that: "stacked should rebuild cache once & then copy from shared to local 90 times (number of frontends) but it isn't because bug means it rebuilds unnecessarily. single layer rebuilds once, stacked was rebuilding 900 times."

What we are still aware of is that editing any activity results in the full rebuild of the coursemodinfo cache which is a concern for large/very large courses (users x content) - there is improvements in https://tracker.moodle.org/browse/MDL-55231 which were going to make it into 4.0 but are now pushed back to 4.1; we're looking at backporting & reworking it for 3.9.

The events thing we assumed was students doing the manual ticking of activity completions. We didn't explore further as the root cause of the CI was elsewhere.

gjb2048 commented 3 years ago

@aspark21 Ah ok, so its a distributed enterprise computing locking issue. Course editing won't result in a full rebuild of a cache, only the part pertaining to a given course.

gjb2048 commented 3 years ago

Note to self: Investigate the event type generated by course completion.

gjb2048 commented 3 years ago

@aspark21 Are you also saying that I've spent lots of time investigating this and it's transpired not to be a bug in my code but in fact to do with core code in relation to your enterprise structured Moodle system?

aspark21 commented 3 years ago

Any course editing does currently result in a full rebuild for a particular course. This will hopefully change in Moodle 4.1 to only updating the particular activity within the cache for that course.

I did say so earlier but apologies for the brevity which I guess didn't fully convey the meaning - https://github.com/gjb2048/moodle-format_topcoll/issues/114#issuecomment-950859595

So while I don't think there is a mysterious unidentified performance issue with other aspects of Collapsed Topics, there is definitely some issues with the additional activity info.

Now that we have a setup that isn't affected by the unnecessary rebuilds, I re-enabled all of the additional activity info on Wednesday afternoon. This lead to one of the courses refusing to load. It has 7000 users (mostly students) in it. So the looping stuff is an issue. I will re-enable tomorrow now that we've updated. but I was hoping the 'courseadditionalmoddatamaxstudents' was going to help with that, in case we wanted to re-enable the additional info

gjb2048 commented 3 years ago

@aspark21 I thought this issue pertained to the activity info and not 'other aspects'? What 'some issues with the additional activity info'????? Because as far as I can tell, I've written the caching and event mechanism as it should be to react to changes in the data that would make the generated activity data invalid. Its even granular enough to only perform what it needs to when a single module changes.

I know the code I've written takes more time, but it is using the core API and indeed actually produces accurate rather than false data.

Please provide the event information and evidence for "Any course editing does currently result in a full rebuild for a particular course. This will hopefully change in Moodle 4.1 to only updating the particular activity within the cache for that course." as I'm observing the module not course events and thus only regenerate what I need to. Unless you mean the 'rebuild' pertains to the core distributed cache issue you mentioned?

Time taken to read previous response and write this comment = 15 mins.

gjb2048 commented 3 years ago

Note to self: Could possibly improve the user enrolment events to be user granular and store on a per module basis the list of users that make up a given count and therefore react accordingly to the different types of event rather than the cache clear ATM.

aspark21 commented 3 years ago

Reminder: This - https://github.com/gjb2048/moodle-format_topcoll/issues/114#issuecomment-943099409 - was referring to other aspects. Which I've already explained wasn't Collapsed Topics related. But you then asked "Are you also saying that I've spent lots of time investigating this and it's transpired not to be a bug in my code but in fact to do with core code in relation to your enterprise structured Moodle system?" Which this was clarifying it wasn't wasted time - "So while I don't think there is a mysterious unidentified performance issue with other aspects of Collapsed Topics, there is definitely some issues with the additional activity info."

For "Any course editing does currently result in a full rebuild for a particular course. This will hopefully change in Moodle 4.1 to only updating the particular activity within the cache for that course." I am referring to the coursemodinfo cache specifically Gareth, this is core Moodle behaviour.

I'm not questioning the accuracy of the additional mod info data, just it's ability to scale. I see you've now backported the 'courseadditionalmoddatamaxstudents' into MOODLE_39. Our next code release is in mid-January now so will have to evaluate that then.

gjb2048 commented 11 months ago

Will be additionally resolved by #145.