goauthentik / authentik

The authentication glue you need.
https://goauthentik.io
Other
13.17k stars 881 forks source link

Login screen slow to load #3153

Open AndrewBucklin opened 2 years ago

AndrewBucklin commented 2 years ago

Describe your question/ Testing out Authentik and so far it's working great, except for one thing: The login screen is terribly slow at loading. It just sits at the "loading" spinner for 15-20 seconds before the 'Email or Username' field appears.

Relevant infos

Logs {"event": "/api/v3/flows/executor/welcome/?query=next%3D%252F", "host": "redacted", "level": "info", "logger": "authentik.asgi", "method": "GET", "pid": 41, "remote": "10.x.x.x", "request_id": "e6b119ee059444fbbc6593a52a5c22e2", "runtime": 19036, "scheme": "https", "status": 200, "timestamp": "2022-06-25T22:41:29.392827", "user": "", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"}

Version and Deployment (please complete the following information):

Additional context

AndrewBucklin commented 2 years ago

Update: It looks like the cause is slow queries on the PostgreSQL cluster node; for example, this query took 68 seconds.

                                                                                     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
 Sort  (cost=120536192.45..120571817.45 rows=14250000 width=4853) (actual time=67540.784..67541.632 rows=2 loops=1)
   Sort Key: authentik_core_source.name
   Sort Method: quicksort  Memory: 25kB
   ->  Hash Left Join  (cost=50.35..194.77 rows=14250000 width=4853) (actual time=67533.375..67533.794 rows=2 loops=1)
         Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_plex_plexsource.source_ptr_id)
         ->  Hash Left Join  (cost=27.52..39.72 rows=50000 width=4740) (actual time=67531.962..67532.378 rows=2 loops=1)
               Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_ldap_ldapsource.source_ptr_id)
               ->  Hash Left Join  (cost=13.02..23.88 rows=500 width=4368) (actual time=67525.813..67526.225 rows=2 loops=1)
                     Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_oauth_oauthsource.source_ptr_id)
                     ->  Hash Right Join  (cost=2.57..13.28 rows=50 width=1580) (actual time=67522.027..67522.436 rows=2 loops=1)
                           Hash Cond: (authentik_sources_saml_samlsource.source_ptr_id = authentik_core_source.policybindingmodel_ptr_id)
                           ->  Seq Scan on authentik_sources_saml_samlsource  (cost=0.00..10.50 rows=50 width=1435) (actual time=0.452..0.453 rows=0 loops=1)
                           ->  Hash  (cost=2.55..2.55 rows=2 width=145) (actual time=67519.006..67519.259 rows=2 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                 ->  Hash Join  (cost=1.04..2.55 rows=2 width=145) (actual time=67517.090..67517.535 rows=2 loops=1)
                                       Hash Cond: (authentik_policies_policybindingmodel.pbm_uuid = authentik_core_source.policybindingmodel_ptr_id)
                                       ->  Seq Scan on authentik_policies_policybindingmodel  (cost=0.00..1.39 rows=39 width=21) (actual time=67493.717..67494.293 rows=39 loops=1)
                                       ->  Hash  (cost=1.02..1.02 rows=2 width=124) (actual time=16.652..16.903 rows=2 loops=1)
                                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                             ->  Seq Scan on authentik_core_source  (cost=0.00..1.02 rows=2 width=124) (actual time=16.044..16.051 rows=2 loops=1)
                                                   Filter: enabled
                     ->  Hash  (cost=10.20..10.20 rows=20 width=2788) (actual time=0.398..0.398 rows=0 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
                           ->  Seq Scan on authentik_sources_oauth_oauthsource  (cost=0.00..10.20 rows=20 width=2788) (actual time=0.391..0.391 rows=0 loops=1)
               ->  Hash  (cost=12.00..12.00 rows=200 width=372) (actual time=2.764..2.765 rows=1 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  Seq Scan on authentik_sources_ldap_ldapsource  (cost=0.00..12.00 rows=200 width=372) (actual time=2.375..2.378 rows=1 loops=1)
         ->  Hash  (cost=15.70..15.70 rows=570 width=113) (actual time=0.219..0.220 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on authentik_sources_plex_plexsource  (cost=0.00..15.70 rows=570 width=113) (actual time=0.111..0.111 rows=0 loops=1)
 Planning Time: 694.659 ms
 JIT:
   Functions: 45
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 1013.424 ms, Inlining 831.278 ms, Optimization 23011.822 ms, Emission 43596.834 ms, Total 68453.358 ms
 Execution Time: 68862.998 ms
(36 rows)
AndrewBucklin commented 2 years ago

Update: After rebooting the PostgreSQL cluster node, that same query only takes 1 second. Any ideas what would cause this or where to look next?

                                                                                    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------
 Sort  (cost=120536192.45..120571817.45 rows=14250000 width=4853) (actual time=1215.237..1215.241 rows=2 loops=1)
   Sort Key: authentik_core_source.name
   Sort Method: quicksort  Memory: 25kB
   ->  Hash Left Join  (cost=50.35..194.77 rows=14250000 width=4853) (actual time=1215.219..1215.226 rows=2 loops=1)
         Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_plex_plexsource.source_ptr_id)
         ->  Hash Left Join  (cost=27.52..39.72 rows=50000 width=4740) (actual time=1215.210..1215.217 rows=2 loops=1)
               Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_ldap_ldapsource.source_ptr_id)
               ->  Hash Left Join  (cost=13.02..23.88 rows=500 width=4368) (actual time=1215.195..1215.201 rows=2 loops=1)
                     Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_oauth_oauthsource.source_ptr_id)
                     ->  Hash Right Join  (cost=2.57..13.28 rows=50 width=1580) (actual time=1215.186..1215.191 rows=2 loops=1)
                           Hash Cond: (authentik_sources_saml_samlsource.source_ptr_id = authentik_core_source.policybindingmodel_ptr_id)
                           ->  Seq Scan on authentik_sources_saml_samlsource  (cost=0.00..10.50 rows=50 width=1435) (actual time=0.002..0.002 rows=0 loops=1)
                           ->  Hash  (cost=2.55..2.55 rows=2 width=145) (actual time=1215.176..1215.177 rows=2 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                 ->  Hash Join  (cost=1.04..2.55 rows=2 width=145) (actual time=1215.171..1215.174 rows=2 loops=1)
                                       Hash Cond: (authentik_policies_policybindingmodel.pbm_uuid = authentik_core_source.policybindingmodel_ptr_id)
                                       ->  Seq Scan on authentik_policies_policybindingmodel  (cost=0.00..1.39 rows=39 width=21) (actual time=1215.123..1215.128 rows=39 loops=1)
                                       ->  Hash  (cost=1.02..1.02 rows=2 width=124) (actual time=0.027..0.027 rows=2 loops=1)
                                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                             ->  Seq Scan on authentik_core_source  (cost=0.00..1.02 rows=2 width=124) (actual time=0.020..0.021 rows=2 loops=1)
                                                   Filter: enabled
                     ->  Hash  (cost=10.20..10.20 rows=20 width=2788) (actual time=0.002..0.002 rows=0 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 8kB
                           ->  Seq Scan on authentik_sources_oauth_oauthsource  (cost=0.00..10.20 rows=20 width=2788) (actual time=0.002..0.002 rows=0 loops=1)
               ->  Hash  (cost=12.00..12.00 rows=200 width=372) (actual time=0.008..0.008 rows=1 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  Seq Scan on authentik_sources_ldap_ldapsource  (cost=0.00..12.00 rows=200 width=372) (actual time=0.005..0.006 rows=1 loops=1)
         ->  Hash  (cost=15.70..15.70 rows=570 width=113) (actual time=0.002..0.003 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on authentik_sources_plex_plexsource  (cost=0.00..15.70 rows=570 width=113) (actual time=0.002..0.002 rows=0 loops=1)
 Planning Time: 1.195 ms
 JIT:
   Functions: 45
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 6.686 ms, Inlining 3.548 ms, Optimization 738.975 ms, Emission 471.920 ms, Total 1221.130 ms
 Execution Time: 1222.311 ms
(36 rows)
BeryJu commented 2 years ago

So there's definitely some ways these SQL queries could be improved, I've played around in the past with different lookups and different index, but nothing really made a difference. Keep in mind I'm not a DBA so I don't have indepth optimisation knowledge

sevmonster commented 2 years ago

I have personally never had speed issues with PSQL and authentik on Alpine (bare-metal Postgres and redis, dockerized authentik server and workers), and it's strange to me that it would run so much faster after a restart. I run multiple databases in the same instance that my authentik connects to and it's still fine. Have you done any performance tuning on your Postgres? Things like matching blocksize to the underlying storage, or adjusting caching and scheduling.

AndrewBucklin commented 2 years ago

It was a new PSQL install on a Hyper-V virtual machine. Currently the only database on it is authentik. I left all the PSQL settings as their defaults.

Over the past couple weeks, I think I narrowed it down to being related to leaving the authentik admin interface tab open in my browser. Seems to be fine if I make sure to remember to log out of the admin interface and close that tab. But the times I forgot to do that, and accidentally stayed logged into authentik for a few days, that's when it starts to slow down and requires a reboot of the PSQL server to clear up the slowness.

RoboMagus commented 2 years ago

I've always had some issues with Authentik being slower to log me in compared to Authelia (which I've previously used). Figured this was because my setup runs on a puny RPi4 so I took that for granted...

Running postgres:12-alpine and redis:alpine as part of my docker-compose stack, as suggested by the docs.

I'm gonna keep an eye on how this develops.

emilyastranova commented 2 years ago

I'm having the same issue, would like to keep track of this.

madkatz01 commented 1 year ago

Likewise - I have both Authelia and Authentik but what keeps me from switching over is how slow Authentik is to load login pages.

AndrewBucklin commented 1 year ago

It's actually working normally for me now for me for quite some time. I've been keeping up with the version updates.

sevmonster commented 1 year ago

I tried what you said @AndrewBucklin about leaving the admin page open and never saw your issue, so either it was a ghost in the machine, or whatever was causing it (in PSQL, Authentik, or what have you) might have been tuned/fixed. Here's hoping.

how slow Authentik is to load login pages.

Some work has been done recently to speed up the pages. Have you tried the latest stable? Authentik is overall quite heavy on host and client, and I am not too fond of that either, but the features it provides all in one place are why I use it.

ProIcons commented 9 months ago

I've setup from scratch on a docker vm that runs on a proxmox hypervisor (5950x, with 64gigs of ram) Authentik v2023.10.5.

i tried everything all and gone all the way back to v2023.8.3 but no luck.

The slowdown is real.

{"auth_via": "session", "event": "/api/v3/core/users/me/", "host": "auth.lan", "level": "info", "logger": "authentik.asgi", "method": "GET", "pid": 27, "remote": "10.0.0.2", "request_id": "ae0945f100c54941a585b3999812d4c1", "runtime": 16047, "scheme": "https", "status": 200, "timestamp": "2023-12-31T00:00:58.664798", "user": "akadmin", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"}
{"auth_via": "session", "event": "/api/v3/core/users/?ordering=last_login&page=1&page_size=20&path_startswith=&search=", "host": "auth.lan", "level": "info", "logger": "authentik.asgi", "method": "GET", "pid": 26, "remote": "10.0.0.2", "request_id": "9410d43db5c846a5834a93a7f781d81a", "runtime": 32038, "scheme": "https", "status": 200, "timestamp": "2023-12-31T00:12:20.973603", "user": "akadmin", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"}

I initial set authentik up on a pre-configured postgres 16 which i'm using for other services as well with some customized configurations

max_connections = 1000
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 262kB
min_wal_size = 4GB
max_wal_size = 16GB
max_worker_processes = 32
max_parallel_workers_per_gather = 16
max_parallel_workers = 32
max_parallel_maintenance_workers = 4

I vacuum analyzed the database, i analyzed each table of the authentik db. i wasn't/still not seeing queries on pg_stat_statements taking more than couple ms.

Even though i wasn't seeing anything, i created 2 more containers with default postgres 12 and redis, and got the same results (actually the attached log is from the last attempt with the default postgres 12)

CPU usage seems fine, RAM is fine, NET IO is fine,

Not sure whats going on.

ProIcons commented 9 months ago

Well i figured it out. Turns out i didn't give this container an non internal network. It didn't have internet connectivity, as soon as i gave it a proper non internal network, everything worked flawlessly besides the embedded outpost, which is for another issue anyway.

The real question is, why does it need internet connectivity?!

spali commented 6 months ago

Can confirm the observation with the external network. I gave already the worker container an external network for the task "Update latest version info". Which would fail otherwise. But the server container didn't had an external network and a lot of api request like user list or tokens have been slow. With an external network with internet access, the slow api request are gone.

Maybe be a coincidence, but with external network the avatar image creation seems also to be fixed.

MatthewJohn commented 5 months ago

Yes, I experienced the same issue. Monitoring traffic it appeared to be gravatar (confirmed external DNS queries and resulting attempt to connect out). Likewise loading the admin users page took forever (I assume a request per user).

As per https://docs.goauthentik.io/docs/installation/air-gapped, I missed the section about disabling gravatar (hoped the environment variables would have covered it - but this is what I get for skim reading ;) )

BeryJu commented 5 months ago

@ProIcons @spali since a couple versions, when gravtar is selected as an avatar method, authentik will check if each user has a gravatar set (so it can fallback to the next method), which would indeed timeout for each user, however the result is cached

with 2024.4 there's a more global check for gravatar availability, so if no internet access is available, authentik will only try once (and cache the result of that)

spali commented 5 months ago

@BeryJu Thanks for the feedback. I still would prefer I could disable the internet access to the server. Would be nice if if this could be offloaded to the worker which needs already internet access for the update_latest_version task. By the way ... is it possible to disable this task to also cut of the worker from the internet without getting errors? But will for sure test 2024.4.

BeryJu commented 5 months ago

@spali you can, see https://docs.goauthentik.io/docs/installation/air-gapped

KiARC commented 2 months ago

I'm currently having issues as well, but the suggestions in this thread haven't helped. It's taking upwards of thirty to 45 seconds, sometimes even multiple minutes, to load the sign-in page on my instance, and this is not a network issue since other sites on the same host accessed from the same client load perfectly fine in a couple of seconds. Sometimes reloading the page works, but not always, and this definitely isn't normal behavior.

KiARC commented 1 month ago

It's been around a month since this issue became problematic for me - I thought I had fixed it with some config changes but they turned out to be unrelated. Rebooting the instance was what fixed it, but now after a few days it's slow again. The login page takes several minutes to load. I don't even have gravatar enabled in the config so that shouldn't be the issue, and even if it was, the container does have internet access, so the issue would be different from the previous ones.