goauthentik / authentik

The authentication glue you need.
https://goauthentik.io
Other
11.85k stars 819 forks source link

Authentik crashing (after Redis timeout) #9483

Open Arragon5xpwm opened 3 months ago

Arragon5xpwm commented 3 months ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

Seems not to be reproduceable my using the (admin) UI

Expected behavior Authentik recovery the connection

Logs

{"error":"dial tcp: lookup redis-authentik: i/o timeout","event":"failed to connect to redis","level":"panic","logger":"authentik.outpost.proxyv2.application","name":"Apprise FowardAuth","timestamp":"2024-04-27T13:27:28+02:00"}

panic: (*logrus.Entry) 0xc0002ac150

goroutine 135 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc0002ac0e0, 0x0, {0xc00060a0e0, 0x1a})
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:260 +0x491
github.com/sirupsen/logrus.(*Entry).Log(0xc0002ac0e0, 0x0, {0xc000b54340?, 0x5?, 0x2?})
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:304 +0x48
github.com/sirupsen/logrus.(*Entry).Panic(...)
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:342
goauthentik.io/internal/outpost/proxyv2/application.(*Application).getStore(_, {0x48, {0xc00019e108, 0x12}, 0xc000114130, {0xc0002ec030, 0x22}, 0xc000812120, 0xc000114150, 0xc000114160, ...}, ...)
    /go/src/goauthentik.io/internal/outpost/proxyv2/application/session.go:76 +0x7e8
goauthentik.io/internal/outpost/proxyv2/application.NewApplication({0x48, {0xc00019e108, 0x12}, 0xc000114130, {0xc0002ec030, 0x22}, 0xc000812120, 0xc000114150, 0xc000114160, {{0xc0002ec120, ...}, ...}, ...}, ...)
    /go/src/goauthentik.io/internal/outpost/proxyv2/application/application.go:140 +0xf4a
goauthentik.io/internal/outpost/proxyv2.(*ProxyServer).Refresh(0xc00016a210)
    /go/src/goauthentik.io/internal/outpost/proxyv2/refresh.go:37 +0x567
goauthentik.io/internal/outpost/ak.(*APIController).OnRefresh(0xc000233180)
    /go/src/goauthentik.io/internal/outpost/ak/api.go:178 +0x314
goauthentik.io/internal/outpost/ak.(*APIController).startIntervalUpdater(0xc000233180)
    /go/src/goauthentik.io/internal/outpost/ak/api_ws.go:189 +0x17b
goauthentik.io/internal/outpost/ak.(*APIController).StartBackgroundTasks.func3()
    /go/src/goauthentik.io/internal/outpost/ak/api.go:216 +0x5d
created by goauthentik.io/internal/outpost/ak.(*APIController).StartBackgroundTasks in goroutine 16
    /go/src/goauthentik.io/internal/outpost/ak/api.go:214 +0x38d

Version and Deployment (please complete the following information):

Additional context continues after restart (restart-policy: unless-stopped)

maxim-mityutko commented 3 months ago

I've experienced similar errors yesterday and it felt like they coincided with the high load either on the node CPU or network. Can't say for sure right now. But feel pretty confident that i'm to reproduce it in my modest homelab by downloading multiple Linux ISOs

rama31244 commented 3 months ago

I'm getting the same errors with authentik crashing every few hours even when idle. Did you manage to fix this?

rama31244 commented 3 months ago

Are you by any chance running authentik on unraid?

Arragon5xpwm commented 3 months ago

Are you by any chance running authentik on unraid?

Yes

rama31244 commented 3 months ago

I'm betting its an unraid issue then, might try messing with some of the environment variables and I'll let you know how i go

rama31244 commented 3 months ago

I think i found my issue, the error seems to originate from the fact that I already had redis installed from when I ran authelia. I deleted the container and appdata folder + started again from a fresh redis container and now everything works as expected. I already changed to the official version of the redis container rather than the bitnami one so not sure if this helped too. Hope it works for you too

rama31244 commented 3 months ago

Actually scratch that, just crashed again with same error. Please let me know if you find a solution

Arragon5xpwm commented 3 months ago

I noticed that when I have heavy I/O, it seems to cause authentiks connection to redis to timeout and authentik can't recover from that. As a workaround I simply set restart-policy: unless-stopped

rama31244 commented 3 months ago

Ok thanks. I might wait to see if it crashes again and then try your workaround. I wonder why this problem isn't reported by more people

fayak commented 2 months ago

I also face this issue:

{"error":"dial tcp: lookup authentik-redis-master: i/o timeout","event":"failed to connect to redis","level":"panic","logger":"authentik.outpost.proxyv2.application","name":"loki","timestamp":"2024-05-28T15:48:09Z"}
panic: (*logrus.Entry) 0xc000237340

goroutine 216 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc0002372d0, 0x0, {0xc000442960, 0x1a})
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:260 +0x491
github.com/sirupsen/logrus.(*Entry).Log(0xc0002372d0, 0x0, {0xc000c9a340?, 0x5?, 0x2?})
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:304 +0x48
github.com/sirupsen/logrus.(*Entry).Panic(...)
    /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:342
goauthentik.io/internal/outpost/proxyv2/application.(*Application).getStore(_, {0x2, {0xc00078e21c, 0x4}, 0xc00019aed0, {0xc0004420a0, 0x1a}, 0xc00078e280, 0xc00019aef0, 0xc00019af20, ...}, .
..)
    /go/src/goauthentik.io/internal/outpost/proxyv2/application/session.go:76 +0x7e8
goauthentik.io/internal/outpost/proxyv2/application.NewApplication({0x2, {0xc00078e21c, 0x4}, 0xc00019aed0, {0xc0004420a0, 0x1a}, 0xc00078e280, 0xc00019aef0, 0xc00019af20, {{0xc000192810, ...}, ...}, ...}, ...)
    /go/src/goauthentik.io/internal/outpost/proxyv2/application/application.go:140 +0xf4a
goauthentik.io/internal/outpost/proxyv2.(*ProxyServer).Refresh(0xc000178160)
    /go/src/goauthentik.io/internal/outpost/proxyv2/refresh.go:37 +0x567
goauthentik.io/internal/outpost/ak.(*APIController).OnRefresh(0xc000681880)
    /go/src/goauthentik.io/internal/outpost/ak/api.go:178 +0x314
goauthentik.io/internal/outpost/ak.(*APIController).startIntervalUpdater(0xc000681880)
    /go/src/goauthentik.io/internal/outpost/ak/api_ws.go:189 +0x17b
goauthentik.io/internal/outpost/ak.(*APIController).StartBackgroundTasks.func3()
    /go/src/goauthentik.io/internal/outpost/ak/api.go:216 +0x5d
created by goauthentik.io/internal/outpost/ak.(*APIController).StartBackgroundTasks in goroutine 192
    /go/src/goauthentik.io/internal/outpost/ak/api.go:214 +0x38d

Redis being unreachable is a problem in itself, but I was hoping Authentik would be able to retry and not crash straight away :/ The problem is that it invalidates the downstream user session, disconnecting them from Grafana/...

rama31244 commented 2 months ago

Are you also on unraid? Setting the restart-policy: unless-stopped is working for me at the moment but the container still crashes about once a day

maxim-mityutko commented 2 months ago

The problem still pops up once in a while, as @Arragon5xpwm mentioned this happens during heavy I/O. Authentik and Redis are deployed on Kubernetes.

mlhynfield commented 1 week ago

Hey I'm seeing the same error, seemingly due to I/O when my uptime service pings applications behind an unauthenticated route but still through Authentik. This is deployed on an RKE2 cluster via the provided Helm chart, with Redis being installed as a dedicated instance for Authentik from the chart.