Closed PM84 closed 1 year ago
hi @PM84
Thanks for the PR, however I am a little confused by this patch. Generally speaking all options should be admin settings in the UI and not custom values on $CFG. Specifically in this case it is NOT hard coded and is already an admin setting??
Hi Brendan, We actually overlooked this setting. With that, I withdraw my PR! Still, it's impressive what impact a case sensitive search has on load times.
Yeah I wonder if there is a core index missing here and maybe if that only applies to certain db's. What are you running this on, and even though it is 150x worse what is the actual query times and the size of the user table you have?
I have now investigated the problem further.
I found out that it is probably a special MySQL problem. It is used in auth/saml2/classes/user_extractor.php:69
$DB->sql_equal
. In MySQL environments collations are added by this method when certain conditions occur (case sensitive and accent sensitive). Casesensitive can be influenced with an admin setting accent sensitive but not. Here a collation xxx_bin
is added. And this collation is the real problem.
In our environment we have 1.5 million active users, so the user table is naturally larger and contains about 2.7 million records.
If we set the third and fourth parameters of sql_equal to false, which is no problem for us, then the collation disappears and the query is sped up from about 450ms to 3ms.
In PostgreSQL sql_equal is defined differently, so this problem does not occur.
I would be very happy if I could contribute a fix here. I would imagine that another admin setting would be added, which is only shown in the MySQL case and with which you could additionally influence the accent sensitivity.
An intermediate load test has also confirmed that the performance problems caused by the fourth parameter have disappeared.
Are you aware of this? https://docs.moodle.org/400/en/MySQL_full_unicode_support
Collation should be utf8mb4_unicodeci and I suspect there should be no action in auth saml2
All our tables have the collation utf8mb4_unicode_ci . I have just checked this again.
So is there an action here then? If there is something to be improved then is this in the moodle core dml layer and not in saml? The usage of $DB->sql_equal seems correct to me
I definitely think there is a possibility for improvement here in the Saml2 plugin. In my estimation the moodle dml layer is quite correct. There is also nothing against the use of $DB->sql_equal. Quite the opposite! However, the missing fourth parameter (accentsensitive) in case of a MySQL DB leads to a collation change to utf8mb4_bin during the request. This is the problem that leads to an unnecessary and enormous slowdown of the requesttime. My suggestion would be, as outlined above, to make it optional how this parameter is set. Gladly via an admin config parameter. The moodle-default in sql_equal is true. In principle I would also contribute this. However, only if there is also a prospect of integration.
Yeah I'm definitely open to pull requests I just want to make sure I fully understand the problem and we fix it in the right place.
I don't think this should be a new admin setting I think it should be a new option in the tolower select. Feel free to re-push a fix into this same pr so we keep the discussion and the commit together
I have updated the PR and added the appropriate option. What I am still struggling with: The effect only occurs in MySQL. Therefore it would also make sense to display the option depending on the prevailing DB layer.
Just for comparison again:
with the default setting "exact":
with new setting:
@PM84 this is broadly looking good. Can you please bump the version too. When phpdocs / CI is green this should be good to land
@brendanheywood I bumped the version to todays date.
thanks @PM84 :)
A brief result from our latest load test :-)
Nice! :)
Just heads up, we are seeing our CI failing with this change. @hdagheda is digging into that.
2) auth_saml2_user_extractor_test::test_get_user_case_and_accent_insensitive_core_fields
Failed asserting that stdClass Object &00000000058568ae00000000092659ac (
'id' => '131000'
'auth' => 'manual'
'confirmed' => '1'
'policyagreed' => '0'
'deleted' => '0'
'suspended' => '0'
'mnethostid' => '1'
'username' => 'username1'
'password' => ''
'idnumber' => 'NB256á'
'firstname' => '伟'
'lastname' => '李'
'email' => 'username1@example.com'
'emailstop' => '0'
'phone1' => ''
'phone2' => ''
'institution' => ''
'department' => ''
'address' => ''
'city' => ''
'country' => ''
'lang' => 'en'
'calendartype' => 'gregorian'
'theme' => ''
'timezone' => '99'
'firstaccess' => '0'
'lastaccess' => '0'
'lastlogin' => '0'
'currentlogin' => '0'
'lastip' => '0.0.0.0'
'secret' => ''
'picture' => '0'
'description' => null
'descriptionformat' => '1'
'mailformat' => '1'
'maildigest' => '0'
'maildisplay' => '2'
'autosubscribe' => '1'
'trackforums' => '0'
'timecreated' => '1661839681'
'timemodified' => '1661839681'
'trustbitmask' => '0'
'imagealt' => null
'lastnamephonetic' => '王'
'firstnamephonetic' => '敏'
'middlename' => 'William'
'alternatename' => 'Karolína'
'moodlenetprofile' => null
'preference' => Array &0 (
'_lastloaded' => 1661839681
)
'lastcourseaccess' => Array &1 ()
'currentcourseaccess' => Array &2 ()
'groupmember' => Array &3 ()
'profile' => Array &4 ()
) is false.
/var/www/site/auth/saml2/tests/user_extractor_test.php:314
/var/www/site/lib/phpunit/classes/advanced_testcase.php:80
We have observed in load tests that the newly introduced case sensitive search by username is about a factor of 150 slower than the caseinsensitive search. In our opinion, it is not good to hardcode the option casesensitive or not. Therefore we introduced a config parameter to set this option via config.php.