Closed glasswalk3r closed 6 years ago
I don't understand what part of the website needs to match e-mail addresses case-insensitively, especially since e-mail addresses are case-sensitive.
I think you mean the other way around :) We want to match all cases, because in the real world email address are case-insensitive. However, when testers submit their address they can be in any case they choose, so we want to match 'dcollins@cpan.org' against any case match we find in the DB, as we know they are the same tester.
These tables are mostly used by the Statistic site.
I mostly just don't know what this ticket's goal is: What is the problem for the end-user, how can it be fixed, and what must happen for this ticket to be completed/closed?
I mostly just don't know what this ticket's goal is
Maybe it is my English that is not good enough, but an issue is usually reported when someone sees a problem. If it needs to be fixed or it was a design decision, that's another matter. In the worst case this goes into a FAQ or something like.
What is the problem for the end-user?
Taking long cycles to see reports for recent releases on CPAN or wait a few seconds to see a report finish, all that because we have performance bottlenecks over the way. For the end user, the impact is low, but we can improve the DB performance a bit with those changes and making the database related code more portable.
how can it be fixed?
We can simply lower all e-mail address before saving in the database. It will be done a single time and I can't see impacts for the end user.
Or we can lower when comparisons are needed. Doing that directly on the DB is not portable between databases as I already demonstrated.
what must happen for this ticket to be completed/closed?
Implement one of the two choices that I suggested, proposing a third one or consider that this is not an issue for the project (and accepting that testing with SQLite will be a problem).
This will have to be solved in the schema, if it can be solved at all. E-mail addresses are case-sensitive, but it's a problematic system that allows e-mail accounts to differ only by case (so I haven't seen one that does it). We also can't easily fix this just by changing the collation on the table: That might break a whole bunch of things that are currently working (relying on inaccurate joins to fetch the data they require). So, for now, the solution is potentially more dangerous than leaving the problem as it is.
While migrating data from Mysql to SQLite to enable more autonomous unit testing, I noticed the following on Mysql DB:
That happens because the table
testers.address
is set to use utf8 case insensitive collation. On SQLite3 such results cannot be reproduced and I had to uselower
on the respective queries to have the same results between the databases.One real example where the current configuration can have problems is the following query:
The
LEFT JOIN
ontesters.address
will fail on SQLite3, and the corresponding unit test01-cpantesters-web-legacy-html.t
will fail as well.