fastapi-users / fastapi-users-db-sqlalchemy

FastAPI Users - Database adapter for SQLAlchemy + encode/databases
MIT License
28 stars 13 forks source link

The email index is not used by `get_by_email()` method #4

Open stephane opened 3 years ago

stephane commented 3 years ago

Describe the bug

An index is created on the email field: https://github.com/fastapi-users/fastapi-users-db-sqlalchemy/blob/main/fastapi_users_db_sqlalchemy/__init__.py#L61

but the index can't be used by the get_by_email() method because the filtering uses a function on the SQL field.

To Reproduce

explain analyze 
SELECT auth_user.hashed_password, auth_user.is_active, auth_user.is_superuser, auth_user.is_verified, auth_user.id, auth_user.email, auth_user.first_name, auth_user.last_name 
FROM auth_user 
WHERE lower(auth_user.email) = lower('foo@bar.com');
Seq Scan on auth_user  (cost=0.00..10.90 rows=1 width=1245) (actual time=0.030..0.030 rows=0 loops=1)
   Filter: (lower((email)::text) = 'foo@bar.com'::text)

Sequential scan is used instead of index scan.

Expected behavior

explain analyze 
SELECT auth_user.hashed_password, auth_user.is_active, auth_user.is_superuser, auth_user.is_verified, auth_user.id, auth_user.email, auth_user.first_name, auth_user.last_name 
FROM auth_user 
WHERE auth_user.email = lower('foo@bar.com');
Index Scan using ix_auth_user_email on auth_user  (cost=0.14..2.36 rows=1 width=1245) (actual time=0.044..0.045 rows=0 loops=1)
   Index Cond: ((email)::text = 'foo@bar.com'::text)

There is several ways to fix the issue:

  1. use CIText extension but it's not standard in SQLAlchemy (https://pypi.org/project/sqlalchemy-citext/).
  2. create a functional index: CREATE INDEX ix_auth_user ON auth_user (lower(email) text_pattern_ops);
  3. store email in lower case (my favorite even though on rare occasions, an outdated server or program might not interpret the capitalization correctly).

I can provide a PR when we'll agree on a solution.

Configuration

frankie567 commented 3 years ago

Thank you for raising this, @stephane! That's indeed a welcome optimization.

My preference would go for 2, i.e. create a functional index.

I prefer not to rely on custom extension, especially since CIText looks only designed for PostgreSQL.

As for storing the e-mail in lower-case, this is a debate we had some time ago and we purposefully chose to store the e-mails case-sensitive but query them case-insensitive.

If you agree with this, I would be happy to review a PR that creates a lowercased index for e-mails 🙂

stephane commented 3 years ago

Django offers the possibility to provide migrations in third-apps code but I didn't find a similar mechanism with Alembic.

With the new __table_args__, Alembic detects the missing index in my app but displays a warning:

UserWarning: autogenerate skipping functional index ix_user_email_lower; not supported by SQLAlchemy reflection

So I had to manually create the Alembic migration in my project and it's convenient for users of fastapi-users.

op.create_index("ix_user_email_lower", "user", [sa.text("lower(email)")])