Open AndrewBucklin opened 2 years ago
Update: It looks like the cause is slow queries on the PostgreSQL cluster node; for example, this query took 68 seconds.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Sort (cost=120536192.45..120571817.45 rows=14250000 width=4853) (actual time=67540.784..67541.632 rows=2 loops=1)
Sort Key: authentik_core_source.name
Sort Method: quicksort Memory: 25kB
-> Hash Left Join (cost=50.35..194.77 rows=14250000 width=4853) (actual time=67533.375..67533.794 rows=2 loops=1)
Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_plex_plexsource.source_ptr_id)
-> Hash Left Join (cost=27.52..39.72 rows=50000 width=4740) (actual time=67531.962..67532.378 rows=2 loops=1)
Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_ldap_ldapsource.source_ptr_id)
-> Hash Left Join (cost=13.02..23.88 rows=500 width=4368) (actual time=67525.813..67526.225 rows=2 loops=1)
Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_oauth_oauthsource.source_ptr_id)
-> Hash Right Join (cost=2.57..13.28 rows=50 width=1580) (actual time=67522.027..67522.436 rows=2 loops=1)
Hash Cond: (authentik_sources_saml_samlsource.source_ptr_id = authentik_core_source.policybindingmodel_ptr_id)
-> Seq Scan on authentik_sources_saml_samlsource (cost=0.00..10.50 rows=50 width=1435) (actual time=0.452..0.453 rows=0 loops=1)
-> Hash (cost=2.55..2.55 rows=2 width=145) (actual time=67519.006..67519.259 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Join (cost=1.04..2.55 rows=2 width=145) (actual time=67517.090..67517.535 rows=2 loops=1)
Hash Cond: (authentik_policies_policybindingmodel.pbm_uuid = authentik_core_source.policybindingmodel_ptr_id)
-> Seq Scan on authentik_policies_policybindingmodel (cost=0.00..1.39 rows=39 width=21) (actual time=67493.717..67494.293 rows=39 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=124) (actual time=16.652..16.903 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on authentik_core_source (cost=0.00..1.02 rows=2 width=124) (actual time=16.044..16.051 rows=2 loops=1)
Filter: enabled
-> Hash (cost=10.20..10.20 rows=20 width=2788) (actual time=0.398..0.398 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on authentik_sources_oauth_oauthsource (cost=0.00..10.20 rows=20 width=2788) (actual time=0.391..0.391 rows=0 loops=1)
-> Hash (cost=12.00..12.00 rows=200 width=372) (actual time=2.764..2.765 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on authentik_sources_ldap_ldapsource (cost=0.00..12.00 rows=200 width=372) (actual time=2.375..2.378 rows=1 loops=1)
-> Hash (cost=15.70..15.70 rows=570 width=113) (actual time=0.219..0.220 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on authentik_sources_plex_plexsource (cost=0.00..15.70 rows=570 width=113) (actual time=0.111..0.111 rows=0 loops=1)
Planning Time: 694.659 ms
JIT:
Functions: 45
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1013.424 ms, Inlining 831.278 ms, Optimization 23011.822 ms, Emission 43596.834 ms, Total 68453.358 ms
Execution Time: 68862.998 ms
(36 rows)
Update: After rebooting the PostgreSQL cluster node, that same query only takes 1 second. Any ideas what would cause this or where to look next?
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------
Sort (cost=120536192.45..120571817.45 rows=14250000 width=4853) (actual time=1215.237..1215.241 rows=2 loops=1)
Sort Key: authentik_core_source.name
Sort Method: quicksort Memory: 25kB
-> Hash Left Join (cost=50.35..194.77 rows=14250000 width=4853) (actual time=1215.219..1215.226 rows=2 loops=1)
Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_plex_plexsource.source_ptr_id)
-> Hash Left Join (cost=27.52..39.72 rows=50000 width=4740) (actual time=1215.210..1215.217 rows=2 loops=1)
Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_ldap_ldapsource.source_ptr_id)
-> Hash Left Join (cost=13.02..23.88 rows=500 width=4368) (actual time=1215.195..1215.201 rows=2 loops=1)
Hash Cond: (authentik_core_source.policybindingmodel_ptr_id = authentik_sources_oauth_oauthsource.source_ptr_id)
-> Hash Right Join (cost=2.57..13.28 rows=50 width=1580) (actual time=1215.186..1215.191 rows=2 loops=1)
Hash Cond: (authentik_sources_saml_samlsource.source_ptr_id = authentik_core_source.policybindingmodel_ptr_id)
-> Seq Scan on authentik_sources_saml_samlsource (cost=0.00..10.50 rows=50 width=1435) (actual time=0.002..0.002 rows=0 loops=1)
-> Hash (cost=2.55..2.55 rows=2 width=145) (actual time=1215.176..1215.177 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Join (cost=1.04..2.55 rows=2 width=145) (actual time=1215.171..1215.174 rows=2 loops=1)
Hash Cond: (authentik_policies_policybindingmodel.pbm_uuid = authentik_core_source.policybindingmodel_ptr_id)
-> Seq Scan on authentik_policies_policybindingmodel (cost=0.00..1.39 rows=39 width=21) (actual time=1215.123..1215.128 rows=39 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=124) (actual time=0.027..0.027 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on authentik_core_source (cost=0.00..1.02 rows=2 width=124) (actual time=0.020..0.021 rows=2 loops=1)
Filter: enabled
-> Hash (cost=10.20..10.20 rows=20 width=2788) (actual time=0.002..0.002 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on authentik_sources_oauth_oauthsource (cost=0.00..10.20 rows=20 width=2788) (actual time=0.002..0.002 rows=0 loops=1)
-> Hash (cost=12.00..12.00 rows=200 width=372) (actual time=0.008..0.008 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on authentik_sources_ldap_ldapsource (cost=0.00..12.00 rows=200 width=372) (actual time=0.005..0.006 rows=1 loops=1)
-> Hash (cost=15.70..15.70 rows=570 width=113) (actual time=0.002..0.003 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on authentik_sources_plex_plexsource (cost=0.00..15.70 rows=570 width=113) (actual time=0.002..0.002 rows=0 loops=1)
Planning Time: 1.195 ms
JIT:
Functions: 45
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 6.686 ms, Inlining 3.548 ms, Optimization 738.975 ms, Emission 471.920 ms, Total 1221.130 ms
Execution Time: 1222.311 ms
(36 rows)
So there's definitely some ways these SQL queries could be improved, I've played around in the past with different lookups and different index, but nothing really made a difference. Keep in mind I'm not a DBA so I don't have indepth optimisation knowledge
I have personally never had speed issues with PSQL and authentik on Alpine (bare-metal Postgres and redis, dockerized authentik server and workers), and it's strange to me that it would run so much faster after a restart. I run multiple databases in the same instance that my authentik connects to and it's still fine. Have you done any performance tuning on your Postgres? Things like matching blocksize to the underlying storage, or adjusting caching and scheduling.
It was a new PSQL install on a Hyper-V virtual machine. Currently the only database on it is authentik. I left all the PSQL settings as their defaults.
Over the past couple weeks, I think I narrowed it down to being related to leaving the authentik admin interface tab open in my browser. Seems to be fine if I make sure to remember to log out of the admin interface and close that tab. But the times I forgot to do that, and accidentally stayed logged into authentik for a few days, that's when it starts to slow down and requires a reboot of the PSQL server to clear up the slowness.
I've always had some issues with Authentik being slower to log me in compared to Authelia (which I've previously used). Figured this was because my setup runs on a puny RPi4 so I took that for granted...
Running postgres:12-alpine
and redis:alpine
as part of my docker-compose stack, as suggested by the docs.
I'm gonna keep an eye on how this develops.
I'm having the same issue, would like to keep track of this.
Likewise - I have both Authelia and Authentik but what keeps me from switching over is how slow Authentik is to load login pages.
It's actually working normally for me now for me for quite some time. I've been keeping up with the version updates.
I tried what you said @AndrewBucklin about leaving the admin page open and never saw your issue, so either it was a ghost in the machine, or whatever was causing it (in PSQL, Authentik, or what have you) might have been tuned/fixed. Here's hoping.
how slow Authentik is to load login pages.
Some work has been done recently to speed up the pages. Have you tried the latest stable? Authentik is overall quite heavy on host and client, and I am not too fond of that either, but the features it provides all in one place are why I use it.
I've setup from scratch on a docker vm that runs on a proxmox hypervisor (5950x, with 64gigs of ram) Authentik v2023.10.5.
i tried everything all and gone all the way back to v2023.8.3 but no luck.
The slowdown is real.
{"auth_via": "session", "event": "/api/v3/core/users/me/", "host": "auth.lan", "level": "info", "logger": "authentik.asgi", "method": "GET", "pid": 27, "remote": "10.0.0.2", "request_id": "ae0945f100c54941a585b3999812d4c1", "runtime": 16047, "scheme": "https", "status": 200, "timestamp": "2023-12-31T00:00:58.664798", "user": "akadmin", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"}
{"auth_via": "session", "event": "/api/v3/core/users/?ordering=last_login&page=1&page_size=20&path_startswith=&search=", "host": "auth.lan", "level": "info", "logger": "authentik.asgi", "method": "GET", "pid": 26, "remote": "10.0.0.2", "request_id": "9410d43db5c846a5834a93a7f781d81a", "runtime": 32038, "scheme": "https", "status": 200, "timestamp": "2023-12-31T00:12:20.973603", "user": "akadmin", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"}
I initial set authentik up on a pre-configured postgres 16 which i'm using for other services as well with some customized configurations
max_connections = 1000
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 262kB
min_wal_size = 4GB
max_wal_size = 16GB
max_worker_processes = 32
max_parallel_workers_per_gather = 16
max_parallel_workers = 32
max_parallel_maintenance_workers = 4
I vacuum analyzed the database, i analyzed each table of the authentik db.
i wasn't/still not seeing queries on pg_stat_statements
taking more than couple ms.
Even though i wasn't seeing anything, i created 2 more containers with default postgres 12 and redis, and got the same results (actually the attached log is from the last attempt with the default postgres 12)
CPU usage seems fine, RAM is fine, NET IO is fine,
Not sure whats going on.
Well i figured it out. Turns out i didn't give this container an non internal network. It didn't have internet connectivity, as soon as i gave it a proper non internal network, everything worked flawlessly besides the embedded outpost, which is for another issue anyway.
The real question is, why does it need internet connectivity?!
Can confirm the observation with the external network. I gave already the worker container an external network for the task "Update latest version info". Which would fail otherwise. But the server container didn't had an external network and a lot of api request like user list or tokens have been slow. With an external network with internet access, the slow api request are gone.
Maybe be a coincidence, but with external network the avatar image creation seems also to be fixed.
Yes, I experienced the same issue. Monitoring traffic it appeared to be gravatar (confirmed external DNS queries and resulting attempt to connect out). Likewise loading the admin users page took forever (I assume a request per user).
As per https://docs.goauthentik.io/docs/installation/air-gapped, I missed the section about disabling gravatar (hoped the environment variables would have covered it - but this is what I get for skim reading ;) )
@ProIcons @spali since a couple versions, when gravtar is selected as an avatar method, authentik will check if each user has a gravatar set (so it can fallback to the next method), which would indeed timeout for each user, however the result is cached
with 2024.4 there's a more global check for gravatar availability, so if no internet access is available, authentik will only try once (and cache the result of that)
@BeryJu Thanks for the feedback.
I still would prefer I could disable the internet access to the server. Would be nice if if this could be offloaded to the worker which needs already internet access for the update_latest_version
task. By the way ... is it possible to disable this task to also cut of the worker from the internet without getting errors?
But will for sure test 2024.4.
@spali you can, see https://docs.goauthentik.io/docs/installation/air-gapped
I'm currently having issues as well, but the suggestions in this thread haven't helped. It's taking upwards of thirty to 45 seconds, sometimes even multiple minutes, to load the sign-in page on my instance, and this is not a network issue since other sites on the same host accessed from the same client load perfectly fine in a couple of seconds. Sometimes reloading the page works, but not always, and this definitely isn't normal behavior.
It's been around a month since this issue became problematic for me - I thought I had fixed it with some config changes but they turned out to be unrelated. Rebooting the instance was what fixed it, but now after a few days it's slow again. The login page takes several minutes to load. I don't even have gravatar enabled in the config so that shouldn't be the issue, and even if it was, the container does have internet access, so the issue would be different from the previous ones.
Describe your question/ Testing out Authentik and so far it's working great, except for one thing: The login screen is terribly slow at loading. It just sits at the "loading" spinner for 15-20 seconds before the 'Email or Username' field appears.
Relevant infos
Logs {"event": "/api/v3/flows/executor/welcome/?query=next%3D%252F", "host": "redacted", "level": "info", "logger": "authentik.asgi", "method": "GET", "pid": 41, "remote": "10.x.x.x", "request_id": "e6b119ee059444fbbc6593a52a5c22e2", "runtime": 19036, "scheme": "https", "status": 200, "timestamp": "2022-06-25T22:41:29.392827", "user": "", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"}
Version and Deployment (please complete the following information):
Additional context