Closed sebastsg closed 3 years ago
@andstor Good you caught those. I suspected there might have been some leftovers I had forgotten about.
@sebastsg Yes, it's a lot of moving parts here 😅 Otherwise, I think this PR is good to go. However, it would be nice if you could double check that the backup / restore procedure is working properly.
@hgeorgsch It would be favorable to confirm that this fix (see #167) actually results in a performance gain.
@andstor I will look at it today. I need to see if I can make a backup of the data set, or if the performance is just too bad to manage :-)
@andstor @sebastsg My attempt to upgrade a server with a big CAPQuiz instance failed. It appeared to fail tacitly, with the CPU activity falling and a partially loaded page. New attempts to update the database gives error messages, presumably because the DB is corrupt. I have also had problems exporting and restoring backups, not always getting error messages in the web interface. Any ideas?
For completeness: Notice: Trying to get property 'contextid' of non-object in /var/www/moodle/mod/capquiz/db/upgrade.php on line 253
Notice: Trying to get property 'component' of non-object in /var/www/moodle/mod/capquiz/db/upgrade.php on line 254
Notice: Trying to get property 'preferredbehaviour' of non-object in /var/www/moodle/mod/capquiz/db/upgrade.php on line 255 Error writing to database
More information about this error
×Debug info: ERROR: null value in column "contextid" violates not-null constraint DETAIL: Failing row contains (154, null, null, null). INSERT INTO mdl_question_usages (contextid,component,preferredbehaviour) VALUES($1,$2,$3) RETURNING id [array ( 'contextid' => NULL, 'component' => NULL, 'preferredbehaviour' => NULL, )] Error code: dmlwriteexception ×Stack trace: line 489 of /lib/dml/moodle_database.php: dml_write_exception thrown line 329 of /lib/dml/pgsql_native_moodle_database.php: call to moodle_database->query_end() line 1025 of /lib/dml/pgsql_native_moodle_database.php: call to pgsql_native_moodle_database->query_end() line 1073 of /lib/dml/pgsql_native_moodle_database.php: call to pgsql_native_moodle_database->insert_record_raw() line 256 of /mod/capquiz/db/upgrade.php: call to pgsql_native_moodle_database->insert_record() line 866 of /lib/upgradelib.php: call to xmldb_capquiz_upgrade() line 565 of /lib/upgradelib.php: call to upgrade_plugins_modules() line 1917 of /lib/upgradelib.php: call to upgrade_plugins() line 713 of /admin/index.php: call to upgrade_noncore()
@hgeorgsch Could you try again with the check I added? It'll give you a list of the question lists in question as well.
Yup, mod_capquiz ×Did not find question usage with id 1 for question list Oblig 2: funksjonsdrøfting (1) ×Did not find question usage with id 2 for question list Oblig 2: funksjonsdrøfting (2) ×Did not find question usage with id 3 for question list Undeployed (3) ×Did not find question usage with id 4 for question list Undeployed (4) ×Did not find question usage with id 11 for question list Finansmatematikk (5) ×Did not find question usage with id 22 for question list Finansmatematikk (6)
What is the next step, @sebastsg ? Is it possible to proceed? Or will I have to wipe and install?
What is the next step, @sebastsg ? Is it possible to proceed? Or will I have to wipe and install?
The database should be in a more correct state now than before the upgrade, assuming the relation between question list and question usage was already broken. Based on that, I think it's fine to continue.
@hgeorgsch Just to confirm some performance gain, you could maybe get away with manually delete the problematic question lists (assuming it is not all of them 😅).
Sure @andstor - if you all help me pray :-)
@hgeorgsch Does the upgrade fail still? I thought it went through?
Not sure. Still running. I may have aborted it the first time because I thought it had completed. Now it runs at 75-85% CPU for postgres and 25-26% for apache2 ... and no sign to complete soon.
Not sure. Still running. I may have aborted it the first time because I thought it had completed. Now it runs at 75-85% CPU for postgres and 25-26% for apache2 ... and no sign to complete soon.
Perhaps I should have added some progress output. How did the database become corrupt in the first place? Are the attempts gone as well?
It was corrupted because the upgrade process did not complete as it should. The problem is in Moodle. Web processes are not robust, and when they do not complete quickly, there is a serious risk. How corrupt it is, I have no idea. I don't get past the upgrade ...
BTW. It broke again. Page seems incomplete, but the browser is no longer waiting, and the load drops to 0.
It was corrupted because the upgrade process did not complete as it should. The problem is in Moodle. Web processes are not robust, and when they do not complete quickly, there is a serious risk. How corrupt it is, I have no idea. I don't get past the upgrade ...
BTW. It broke again. Page seems incomplete, but the browser is no longer waiting, and the load drops to 0.
Oh, so it actually gets timed out. Are you sure that's not a PHP or Apache setting? I think I'd want me or @andstor to look at the database. Don't try to upgrade again.
Usually I see an error message when it times out, but not here. The max_execution_time is 600s, maybe it needs to be set higher, but this is bad design. A web interface is not appropriate for jobs which take 10 minutes or more. The real problem is that it does not fail gracefully though. @andstor has access - it is iirmoodle.it.ntnu.no I'll create an account for you too.
Usually I see an error message when it times out, but not here.
Maybe you don't see error messages because debugging mode is turned off?
Finally an attempt which timed out with an error message. Sorry, you asked me not to upgrade again. It is possible that it works by increasing the PHP limit, but it is evidently very fragile with a serious risk of breaking before it times out.
Did not find question usage with id 1 for question list Oblig 2: funksjonsdrøfting (1) ×Did not find question usage with id 2 for question list Oblig 2: funksjonsdrøfting (2) ×Did not find question usage with id 3 for question list Undeployed (3) ×Did not find question usage with id 4 for question list Undeployed (4) ×Did not find question usage with id 11 for question list Finansmatematikk (5) ×Did not find question usage with id 22 for question list Finansmatematikk (6) ×Did not find question usage with id 23 for question list Finansmatematikk (7) Upgrade timed out, please restart the upgrade.
More information about this error
×Debug info: Error code: upgradetimedout ×Stack trace: line 498 of /lib/setuplib.php: moodle_exception thrown line 241 of /lib/upgradelib.php: call to print_error() line 348 of /lib/upgradelib.php: call to upgrade_set_timeout() line 314 of /mod/capquiz/db/upgrade.php: call to upgrade_mod_savepoint() line 866 of /lib/upgradelib.php: call to xmldb_capquiz_upgrade() line 565 of /lib/upgradelib.php: call to upgrade_plugins_modules() line 1917 of /lib/upgradelib.php: call to upgrade_plugins() line 713 of /admin/index.php: call to upgrade_noncore()
Maybe you don't see error messages because debugging mode is turned off?
No, debugging has been on. Furthermore, the message I was after was not debug info but user info. Time out happens fairly often in Moodle, and it is imperative that the user knows when it happens. Tacitly failing is not allowed.
The output is much better now, but @sebastsg could we add a warning that the migration in this particular case is exceptionally slow? Still waiting for the second question list to migrate ...
The upgrade came through on the second attempt, losing 2000-odd attempts from the quiz which timed out on the first attempt. The teacher interface is much faster, cutting run times reported by the profiler from about 45s to 1-2s. No problems are detected in the teacher interface.
However, the student interface breaks with an error: Exception - Call to undefined method mod_capquiz\capquiz::question_usage()
This does not look like DB corruption ... Did you test the student interface, @andstor ?
Any ideas, @sebastsg ?
The problem in the student view is related to upgrading. It works with a new CAPQuiz instance. However, I have the same problem with both the upgraded instances, also the one which did not time out.
I have made further tests,
The last problem reported is in capquizqtrackerblock
Question usages are now stored per user, instead of question lists.
The upgrade script will split the question usage into one for each user, then delete the original.