catalyst / moodle-auth_saml2

SAML done 100% in Moodle, fast, simple, secure
https://moodle.org/plugins/auth_saml2
70 stars 132 forks source link

MBS-6571: Fix issue with case insensitive search on usernames #714

Closed PM84 closed 1 year ago

PM84 commented 1 year ago

We have observed in load tests that the newly introduced case sensitive search by username is about a factor of 150 slower than the caseinsensitive search. In our opinion, it is not good to hardcode the option casesensitive or not. Therefore we introduced a config parameter to set this option via config.php.

brendanheywood commented 1 year ago

hi @PM84

Thanks for the PR, however I am a little confused by this patch. Generally speaking all options should be admin settings in the UI and not custom values on $CFG. Specifically in this case it is NOT hard coded and is already an admin setting??

PM84 commented 1 year ago

Hi Brendan, We actually overlooked this setting. With that, I withdraw my PR! Still, it's impressive what impact a case sensitive search has on load times.

brendanheywood commented 1 year ago

Yeah I wonder if there is a core index missing here and maybe if that only applies to certain db's. What are you running this on, and even though it is 150x worse what is the actual query times and the size of the user table you have?

PM84 commented 1 year ago

I have now investigated the problem further. I found out that it is probably a special MySQL problem. It is used in auth/saml2/classes/user_extractor.php:69 $DB->sql_equal. In MySQL environments collations are added by this method when certain conditions occur (case sensitive and accent sensitive). Casesensitive can be influenced with an admin setting accent sensitive but not. Here a collation xxx_bin is added. And this collation is the real problem. In our environment we have 1.5 million active users, so the user table is naturally larger and contains about 2.7 million records. If we set the third and fourth parameters of sql_equal to false, which is no problem for us, then the collation disappears and the query is sped up from about 450ms to 3ms. In PostgreSQL sql_equal is defined differently, so this problem does not occur. I would be very happy if I could contribute a fix here. I would imagine that another admin setting would be added, which is only shown in the MySQL case and with which you could additionally influence the accent sensitivity.

An intermediate load test has also confirmed that the performance problems caused by the fourth parameter have disappeared.

brendanheywood commented 1 year ago

Are you aware of this? https://docs.moodle.org/400/en/MySQL_full_unicode_support

Collation should be utf8mb4_unicodeci and I suspect there should be no action in auth saml2

PM84 commented 1 year ago

All our tables have the collation utf8mb4_unicode_ci . I have just checked this again.

brendanheywood commented 1 year ago

So is there an action here then? If there is something to be improved then is this in the moodle core dml layer and not in saml? The usage of $DB->sql_equal seems correct to me

PM84 commented 1 year ago

I definitely think there is a possibility for improvement here in the Saml2 plugin. In my estimation the moodle dml layer is quite correct. There is also nothing against the use of $DB->sql_equal. Quite the opposite! However, the missing fourth parameter (accentsensitive) in case of a MySQL DB leads to a collation change to utf8mb4_bin during the request. This is the problem that leads to an unnecessary and enormous slowdown of the requesttime. My suggestion would be, as outlined above, to make it optional how this parameter is set. Gladly via an admin config parameter. The moodle-default in sql_equal is true. In principle I would also contribute this. However, only if there is also a prospect of integration.

brendanheywood commented 1 year ago

Yeah I'm definitely open to pull requests I just want to make sure I fully understand the problem and we fix it in the right place.

I don't think this should be a new admin setting I think it should be a new option in the tolower select. Feel free to re-push a fix into this same pr so we keep the discussion and the commit together

PM84 commented 1 year ago

I have updated the PR and added the appropriate option. What I am still struggling with: The effect only occurs in MySQL. Therefore it would also make sense to display the option depending on the prevailing DB layer.

PM84 commented 1 year ago

Just for comparison again:

with the default setting "exact": image

with new setting: image

brendanheywood commented 1 year ago

@PM84 this is broadly looking good. Can you please bump the version too. When phpdocs / CI is green this should be good to land

PM84 commented 1 year ago

@brendanheywood I bumped the version to todays date.

brendanheywood commented 1 year ago

thanks @PM84 :)

PM84 commented 1 year ago

A brief result from our latest load test :-)

image

brendanheywood commented 1 year ago

Nice! :)

dmitriim commented 1 year ago

Just heads up, we are seeing our CI failing with this change. @hdagheda is digging into that.

2) auth_saml2_user_extractor_test::test_get_user_case_and_accent_insensitive_core_fields
Failed asserting that stdClass Object &00000000058568ae00000000092659ac (
    'id' => '131000'
    'auth' => 'manual'
    'confirmed' => '1'
    'policyagreed' => '0'
    'deleted' => '0'
    'suspended' => '0'
    'mnethostid' => '1'
    'username' => 'username1'
    'password' => ''
    'idnumber' => 'NB256á'
    'firstname' => '伟'
    'lastname' => '李'
    'email' => 'username1@example.com'
    'emailstop' => '0'
    'phone1' => ''
    'phone2' => ''
    'institution' => ''
    'department' => ''
    'address' => ''
    'city' => ''
    'country' => ''
    'lang' => 'en'
    'calendartype' => 'gregorian'
    'theme' => ''
    'timezone' => '99'
    'firstaccess' => '0'
    'lastaccess' => '0'
    'lastlogin' => '0'
    'currentlogin' => '0'
    'lastip' => '0.0.0.0'
    'secret' => ''
    'picture' => '0'
    'description' => null
    'descriptionformat' => '1'
    'mailformat' => '1'
    'maildigest' => '0'
    'maildisplay' => '2'
    'autosubscribe' => '1'
    'trackforums' => '0'
    'timecreated' => '1661839681'
    'timemodified' => '1661839681'
    'trustbitmask' => '0'
    'imagealt' => null
    'lastnamephonetic' => '王'
    'firstnamephonetic' => '敏'
    'middlename' => 'William'
    'alternatename' => 'Karolína'
    'moodlenetprofile' => null
    'preference' => Array &0 (
        '_lastloaded' => 1661839681
    )
    'lastcourseaccess' => Array &1 ()
    'currentcourseaccess' => Array &2 ()
    'groupmember' => Array &3 ()
    'profile' => Array &4 ()
) is false.

/var/www/site/auth/saml2/tests/user_extractor_test.php:314
/var/www/site/lib/phpunit/classes/advanced_testcase.php:80