argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.33k stars 5.26k forks source link

Terminal closes with error 'websocket: close 1006' #14271

Open matthiasdeblock opened 1 year ago

matthiasdeblock commented 1 year ago

Checklist:

Describe the bug

Opening a terminal in ArgoCD and let it rest for about 50 seconds results in an unresponsive terminal.

To Reproduce

Open a Pod terminal in ArgoCD and wait for 50 seconds.

Expected behavior

Continue to work in the shell even after 50 seconds inactivity.

Screenshots

Screenshot from the Response of a websocket terminal call:

image

Version

argocd: v2.4.28+598f792
  BuildDate: 2023-03-23T14:58:46Z
  GitCommit: 598f79236ae4160325b37342434baef4ff95d61c
  GitTreeState: clean
  GoVersion: go1.18.10
  Compiler: gc
  Platform: linux/amd64

Logs

time="2023-06-29T14:12:55Z" level=error msg="read message err: websocket: close 1006 (abnormal closure): unexpected EOF"
E0629 14:12:55.558269       1 v2.go:105] websocket: close 1006 (abnormal closure): unexpected EOF
ebuildy commented 1 year ago

could be fixed by https://github.com/argoproj/argo-cd/pull/14192

mateuszkozakiewicz commented 1 year ago

Also running into a similar issue, my connection is closed immediately and I've tried using portforward to the argocd-server pod so this is not loadbalancer fault image

time="2023-07-17T20:44:42Z" level=info msg="terminal session starting" appNamespace=argocd application=whoami container=whoami namespace=web podName=whoami-web-app-5574fc8558-l9xq5 project=web-portfolio userName=admin                                   
time="2023-07-17T20:44:42Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=Watch grpc.service=application.ApplicationService grpc.start_time="2023-07-17T20:44:39Z" grpc.time_ms=3185.257 span.kind=server system=grpc      
time="2023-07-17T20:44:42Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchResourceTree grpc.service=application.ApplicationService grpc.start_time="2023-07-17T20:44:39Z" grpc.time_ms=3183.268 span.kind=server syste2023/07/17 20:44:43 http: response.WriteHeader on hijacked connection from github.com/argoproj/argo-cd/v2/server/application.(*terminalHandler).ServeHTTP (terminal.go:245)                                                                                 2023/07/17 20:44:43 http: response.Write on hijacked connection from fmt.Fprintln (print.go:285)                                                          
time="2023-07-17T20:44:43Z" level=error msg="read message err: read tcp 127.0.0.1:8080->127.0.0.1:37252: use of closed network connection"
time="2023-07-17T20:44:43Z" level=error msg="read message err: read tcp 127.0.0.1:8080->127.0.0.1:37252: use of closed network connection"
time="2023-07-17T20:44:43Z" level=error msg="read message err: read tcp 127.0.0.1:8080->127.0.0.1:37252: use of closed network connection"
time="2023-07-17T20:44:43Z" level=error msg="read message err: read tcp 127.0.0.1:8080->127.0.0.1:37252: use of closed network connection"
E0717 20:44:43.290930       7 v2.go:105] EOF
E0717 20:44:43.290932       7 v2.go:105] EOF
E0717 20:44:43.290931       7 v2.go:105] EOF
E0717 20:44:43.290935       7 v2.go:105] EOF

ArgoCD 2.7.7, helm-chart 5.38.1 values file:

configs:
    cm:
        exec.enabled: true
    params:
        server.insecure: true
matthiasdeblock commented 1 year ago

could be fixed by #14192

Is it possible to backport this to 2.4 and later?

crenshaw-dev commented 1 year ago

@matthiasdeblock we no longer support anything earlier than 2.6.

bradenwright-opunai commented 12 months ago

Im running into the same symptoms but I'm on v2.8.0+804d4b8 anyway to know if the fix should be included or not.

bradenwright-opunai commented 12 months ago

Fwiw, I tried to upgrade to v2.8.3+77556d9 but it still seems to hang, any idea of what version got fixed or if there is a further problem. Also probably worth mentioning that currently ArgoCD is deploy in GKE using GCE-Ingress

bradenwright-opunai commented 12 months ago

FWIW, I also tried to test when port forwarding to ArgoCD server, and it works without issues. So feel like something with the load balancer. Doesn't feel like a timeout bc its quick. I can setup a backednconfig for session affinity if needed, but I'd expect the docs to say if that was a requirement.

https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-configuration

bradenwright-opunai commented 12 months ago

Im leaving some comments on https://github.com/argoproj/argo-cd/pull/14192 as well, but from what I can tell the terminal is locking in under 60 secs, like 15-45 secs and it locks so the keep alive being at 60 secs isn't resolving my issue (best I can tell). I did just see that the timeout for the LB is 30 secs so let me try to make that value longer.

bradenwright-opunai commented 12 months ago

Alright so in GCP the default timeout for a backend service is 30 secs, with the default settings the terminal was hanging. After increasing that timeout > 60 secs (currently set for 3600 secs ie 1 hr) and I've been able to wait 10+ mins and return to a working terminal. Everything is working now as expected.

I would recoomend that the docs https://argo-cd.readthedocs.io/en/stable/operator-manual/web_based_terminal/ be updated to call out the 60 sec check that now exists and that for LB's the timeout needs to be > 60 secs

Arulaln-AR commented 10 months ago

@erhudy , is the fix available in v2.7.3 argocd version. We are facing the same issue where in our terminal closes around 60 seconds of inactivity.

bravosierrasierra commented 10 months ago

Same problem is here. Upgrade to 2.8.4 nothing changed.

Problem appeared after kubernetes upgrade from 1.22 with cilium to 1.25 with cilium on same cloud provider. Disabling network policies nothing changed.

Exposing argocd without cloud NLB nothing changed: terminal freezes in 30-60 seconds.

~/git/notes/org-mode $ kubectl port-forward service/argocd-server -n argocd 30000:80
Forwarding from 127.0.0.1:30000 -> 8080
Forwarding from [::1]:30000 -> 8080
Handling connection for 30000
Handling connection for 30000
Handling connection for 30000
Handling connection for 30000
Handling connection for 30000
error: lost connection to pod
~/git/notes/org-mode $ 

Ingress-nginx have extended timeout annotation

annotations:
    ingress.kubernetes.io/proxy-body-size: 100M
    kubernetes.io/ingress.class: "nginx"
    ingress.kubernetes.io/app-root: "/"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"

in logs:


level=error msg="read message err: websocket: close 1006 (abnormal closure): unexpected EOF"

level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = no session information" grpc.code=Unauthenticated grpc.method=List grpc.service=application.ApplicationService grpc.start_time="2023-10-19T13:12:29Z" grpc.time_ms=24.735 span.kind=server system=grpc

Argocd-server with Oauth2 integration with Keycloak: other UI-elements works as expected

bravosierrasierra commented 10 months ago

found merged MR https://github.com/argoproj/argo-cd/pull/14192 for terminal keepalive. Enabled websocket tracing (https://developer.chrome.com/blog/new-in-devtools-74/#binary) and dont see any pings in websocket messages image

bravosierrasierra commented 10 months ago

Same problem is here. Upgrade to 2.8.4 nothing changed.

Problem appeared after kubernetes upgrade from 1.22 with cilium to 1.25 with cilium on same cloud provider. Disabling network policies nothing changed.

my problem was containerd, restarted every minute via mistake cron job. Sorry for my mistake.