carltongibson / django-unique-user-email

Enable login-by-email with the default User model for your Django project by making auth.User.email unique.
MIT License
113 stars 6 forks source link

Shouldn't the unique constraint be case-insensitive? #14

Open Flimm opened 4 days ago

Flimm commented 4 days ago

Email addresses look like this: username@hostname . The hostname part is case-insensitive. The username part is case-sensitive in theory, but case-insensitive in practise.

Shouldn't the unique constraint on the email field be case-insensitive? What is this application's strategy regarding case-sensitivity? It would be useful if this application's behaviour regarding case-sensitivity was documented.

ryanhiebert commented 4 days ago

Case-sensitive emails have bitten me several times, as they aren't what 99.9% of people expect, and at least for my projects I'm convinced that emails should be treated as case-insensitive, as you describe. It grates me to go against what's technically accurate generally, so it took me a while to come around to. I'm not sure it's always the right choice for everyone, but I do think its almost always the right choice.

Flimm commented 4 days ago

The developers of django-allauth have decided to always store email addresses in lower case, and their reasoning is explained here: https://docs.allauth.org/en/latest/account/email.html#case-sensitivity

To quote:

Historically, email addresses started out as case sensitive because the local part (the part before the “@”) could represent a case sensitive user name. However, over time, this proved to be a bad idea and the RFCs that succeeded the original were adjusted to move away from treating email addresses as case sensitive.

  • RFC 821: The original RFC from 1982 relies on case sensitivity.

    • RFC 2821: Released in 2001, obsoletes RFC 821, yet, is still case sensitive.
  • RFC 5321: In 2008, the “Local-part” is weakened to “MAY be case-sensitive”.

  • RFC 6530: In 2012, the following is acknowledged:

It has long been the case that the email syntax permits choices about mailbox names that are unwise in practice […]. The most often cited examples involve the use of case-sensitivity […] in mailbox local parts. These deliberately unusual constructions are permitted by the protocols, and servers are expected to support them. Although they can provide value in special cases, taking advantage of them is almost always bad practice unless the intent is to create some form of security by obscurity.

To deal with this, previous versions of django-allauth used to store email addresses in their original case, while performing lookups in a case insensitive style. This approach led to subtle bugs in upstream code, and also comes at a performance cost (__iexact lookups). The latter requires case insensitive index support, which not all databases support. Re-evaluating the approach in current times has led to the conclusion that the benefits do not outweigh the costs. Therefore, email addresses are now always stored as lower case.

carltongibson commented 4 days ago

Yes, you're both absolutely right. So, when creating a separate EmailCredential model, this would be exactly the way forward.

When demonstrating the minimum needed to enable login-by-email with Django's existing auth.User, by way of bootstrapping a conversation as to how we might evolve said model going forward, that wasn't a change I wanted to insist upon.

There are various layers you might take this on. For instance, you might go via lower() in you form's clean methods when storing emails. (And so on.)