arrikto / oidc-authservice

This is a fork/refactoring of the ajmyyra/ambassador-auth-oidc project
MIT License
87 stars 66 forks source link

Authservice pod "Failed to save state in store: error trying to save session: input/output error" #112

Open psheorangithub opened 1 year ago

psheorangithub commented 1 year ago

We have integrated kubeflow with OIDC flow(Heracles+LDAP)We are unable to login to kubeflow UI. GUI throws below error.

Access to kubeflow.aiwb-enc-data-cpu1.uscentral-prd-az3.k8s.int was deniedYou don't have authorization to view this page. HTTP ERROR 403

While checking the authservice pod logs, I see below error. It happens every couple of days.

2023/03/15 14:08:40 boltstore: remove expired sessions error: input/output error time="2023-03-15T14:06:29Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip= request=/

The issue resolves after restarting authservice pod but it re-appear after every 10-15 days. We have checked the underlying PVC status, it looks healthy.

Can someone look into it and suggest what could be the cause?

edwardzjl commented 1 year ago

Are you deploying kubeflow with kubeflow-manifests? Have you checked the disk usage of the volume? The default volume is 10Gi, without further information I can only guess maybe after 10-15 days the disk is full?

psheorangithub commented 1 year ago

Yes, I have deployed using kubeflow manifest. To be specific, https://github.com/kubeflow/manifests/tree/v1.6.0. Yes, i did check the disk usage already. The PVC used by authservice is 10G and only file in that is "data-db" which was only 5 MB a the time of issue. Overall volumes usage also looks good.

MiraiChino commented 1 year ago

We are experiencing the same issue in our environment as well. The "Failed to Save State in Store: Input/Output Error" error keeps showing up for the authservice pod, even though all other components seem to be running fine. Upon launching the Kubeflow environment and adding five users, we have encountered a recurring 403 error on the Kubeflow Dex login page, even when no users were logged in.

Environment:

Pod Information:

Issue Details:

# kubectl top pod authservice-0 -n istio-system
NAME            CPU(cores)   MEMORY(bytes)   
authservice-0   1m           3Mi      
# ls -lh /export/kubernetes/istio-system-authservice-pvc-pvc-3e8dd897-4478-40c5-a007-e1d1aa55f734
total 24K
-rw-r--r-- 1 systemd-network tss 32K Jul 24 05:25 data.db

Error Logs:

# kubectl logs authservice-0 -n istio-system
time="2023-07-24T05:25:20Z" level=info msg="Starting readiness probe at 8081"
time="2023-07-24T05:25:20Z" level=info msg="No  USERID_TOKEN_HEADER  specified, using 'kubeflow-userid-token' as default."
time="2023-07-24T05:25:20Z" level=info msg="No  SERVER_HOSTNAME  specified, using '' as default."
time="2023-07-24T05:25:20Z" level=info msg="No  SERVER_PORT  specified, using '8080' as default."
time="2023-07-24T05:25:20Z" level=info msg="No  SESSION_MAX_AGE  specified, using '86400' as default."
time="2023-07-24T05:25:20Z" level=info msg="Starting web server at :8080"
time="2023-07-24T05:47:51Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:48:29Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:50:10Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:50:25Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:55:03Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:55:04Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
MiraiChino commented 1 year ago

Additional Content: After setting the log level of oidc-authservice to DEBUG, I rechecked the logs when the error occurred again. I discovered that the error is related to boltstore/reaper, which is responsible for releasing unnecessary resources, rather than using boltdb for session management.

2023/08/01 03:22:10 boltstore: remove expired sessions error: input/output error
time="2023-08-01T03:22:57Z" level=warning msg="Request doesn't have a valid session." ip=192.168.200.15 request=/logout
time="2023-08-01T03:22:57Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.15 request=/