argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.72k stars 5.4k forks source link

Problem with Terminal and linux shell different from bash #9641

Closed odedonato closed 2 years ago

odedonato commented 2 years ago

Describe the bug

Tried the new most wanted feature, Terminal, it works with linux container with bash but got some problem with container Alpine based.

In screenshot, tried on busybox:stable container (it's in loop to keep it running) got # after OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "bash": executable file not found in $PATH: unknown but cannot use it (cannot write)

Tried with some other container based on node:lts-alpine and got only black screen with blinking cursor.

Shell is working with same container using Rancher and Lens.

To Reproduce

Click on Terminal on Alpine based container

Expected behavior

Get shell from container and possibility to use it

Screenshots

image

Version

2.4.0+91aefab
mvpwar commented 2 years ago

use nginx image instead, it works

odedonato commented 2 years ago

@mvpwar thanks for your reply

I tried with nginx:1.19.1 and it works but with nginx:1.19.1-alpine not work, black screen with blinking cursor.

mihma commented 2 years ago

Same issue for me when using node:14.16.0-alpine base image

crenshaw-dev commented 2 years ago

Can y'all check the browser's console and/or network dev tools for errors when connecting to the alpine pod?

odedonato commented 2 years ago

Hi @crenshaw-dev

After a while I got a 504 Gateway timeout as you can see on the screenshot

image

With other containers, not alpine, on same cluster all is working like a charm

mihma commented 2 years ago

Hi @crenshaw-dev , I get blinking cursor with never dropping connection to the terminal as well as some weird font download error.

PS. It works fine on non-alpine pods: working terminal and zero font errors.

Screenshot 2022-06-14 at 15 45 11
arthurk commented 2 years ago

It fails for our alpine based images too. It works when bash is installed via apk add bash

The error I get in argocd-server is:

{"level":"error","msg":"read message err: websocket: unexpected reserved bits 0x60","time":"2022-06-15T04:15:27Z"}
djfinnoy commented 2 years ago

Same issue here, works fine with containers that have bash, weird websocket issues with containers that only have sh. Black screen for containers with neither.

crenshaw-dev commented 2 years ago

From #9425 :

Looks like ArgoCD is having issues spawning a terminal with the alpine ash shell.

It works after adding bash package:

apk add bash

Is it possible we just need to add ash as a shell to try here?

https://github.com/argoproj/argo-cd/blob/56eb09507e917de9c962a3249435d05f4dbf7e4b/server/application/terminal.go#L219

odedonato commented 2 years ago

@crenshaw-dev I was wondering if is useful and possible to put validShells in a ConfigMap, so we can add our shell, like pwsh, ash, etc. In that way is not binded in the code.

crenshaw-dev commented 2 years ago

@odedonato making it configurable is probably a good idea. But I'm not yet convinced that the (only) problem is not having enough shells in that array. If someone has time, a quick test to see if adding ash fixes Alpine would be helpful.

odedonato commented 2 years ago

@crenshaw-dev I downloaded the code and added ash then builded the container and tried but didn't work, same black screen with blinking cursor.

arthurk commented 2 years ago

I don't think adding more shells to the list is going to fix the problem. I often use k9s to launch shells in containers and it works well. Their code looks like this:

    shellCheck = `command -v bash >/dev/null && exec bash || exec sh`

https://github.com/derailed/k9s/blob/fc8ffe5d37d068f14e86aed7e8b847d016c99b19/internal/view/exec.go#L30

I can also run an alpine image and exec sh in the CLI which works:

kubectl run --rm -i -t --image=alpine mytest sh

However using the same image it doesn't work in the web ui.

I would be interested to see if putting sh before bash would show a different result in Argo CD

odedonato commented 2 years ago

@crenshaw-dev turns out that the black screen it's only on local k8s cluster where is located argocd and it's because it was a lack of permission on k8s RBAC, in ClusterRole:argocd-server added:

- apiGroups:
  - ""
  resources:
  - pods/exec
  verbs:
  - create

After that I got the error OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "bash": executable file not found in $PATH: unknown same error in the screenshot in the first post, got # but not usable, so I tried what suggested by @arthurk.

I built argo with sh first and it works, but it opens only sh on all containers. It's just a workaroud imho.

I found this: https://github.com/argoproj/argo-cd/blob/56eb09507e917de9c962a3249435d05f4dbf7e4b/server/application/terminal.go#L225

It's exactly what happen!

elmariofredo commented 2 years ago

@yeya24 can we maybe switch to sh as a first shell until this is resolved? This issue make this feature pretty much unusable for most of containers.

yeya24 commented 2 years ago

I am aware of this issue but there might be other issues more than the shell order. For k8s dashboard code, it uses the same shell order but it can enable exec for alpine images using sh. There might be some bugs related to the session close step. If the error can be detected then ideally sh will be tried after we found bash is not working.

elmariofredo commented 2 years ago

It feels like this issue will be bit harder to figure out, as there are multiple libraries involved. My suggestion is to quick fix this issue, by temporary removing bash shell from list which would resolve this issue for most containers. When problem shell test is resolved we can put bash back. WDYT?

yeya24 commented 2 years ago

It feels like this issue will be bit harder to figure out, as there are multiple libraries involved. My suggestion is to quick fix this issue, by temporary removing bash shell from list which would resolve this issue for most containers. When problem shell test is resolved we can put bash back. WDYT?

I don't have any preference. Feel free to open a pr to change the order and ArgoCD maintainers can decide whether they are willing to accept this change or not. Again maybe a configmap for shells so that users can reorder the shells as they want would be more flexible.

crenshaw-dev commented 2 years ago

I’m in favor of that change, and I agree that a configurable list makes sense. I think the config can go in ‘argocd-cmd-params-cm’.

m-yosefpor commented 2 years ago

all the following 3 issues are the same:

I think we can follow in one thread.

m-yosefpor commented 2 years ago

I’m in favor of that change, and I agree that a configurable list makes sense. I think the config can go in ‘argocd-cmd-params-cm’.

I'm not against this feature, but I feel this feature might not be useful for most of the use-cases, as logical order of shells for a simple pod terminal is what currently listed in the code (as bash and sh are the most common shells which could be found in container images). Also it doesn't really fixes the retrying next shell issue (which is the actual problem here), since as I wrote in here, we might still want bash to be the first shell and retry sh in case it is missing. Having a global config for it doesn't really solve this exact problem (of course it could be a tmp workaround).

crenshaw-dev commented 2 years ago

of course it could be a tmp workaround

I'm satisfied with a temporary workaround for now. This will solve a headache for 90% of use cases and will buy us time to figure out the 10%.

Making the list configurable is a win regardless, because people might want to add options that are not currently in the hardcoded list (e.g. ash).

crenshaw-dev commented 2 years ago

I think I actually found the real issue. Can someone test the PR to confirm? https://github.com/argoproj/argo-cd/pull/9895

elmariofredo commented 2 years ago

I think I actually found the real issue. Can someone test the PR to confirm? #9895

I can confirm that your PR is solving the issue. Even that bash will fail, switch to sh is working and terminal is usable 🎉

Thanks!

image
michalziobro commented 2 years ago

I'm still having this issue with 2.4.4 version: argocd-server log:

E0712 10:17:06.186996       1 v3.go:79] EOF
9
E0712 10:17:06.429485       1 v2.go:105] EOF
8
time="2022-07-12T10:17:07Z" level=error msg="read message err: websocket: unexpected reserved bits 0x40"
7
E0712 10:17:07.727456       1 v2.go:105] websocket: unexpected reserved bits 0x40

blinking cursor in terminal tab, cannot type anything: image

Terminal works ok with bash.

arthurk commented 2 years ago

@michalziobro the fix is not in 2.4.4, I hope it will be available in the next release

crenshaw-dev commented 2 years ago

Yep, I'll release the fix in the next few hours. It'll be in 2.4.6, since I've got some other stuff queued up for 2.4.5.

arthurk commented 2 years ago

I've upgraded to 2.4.6 and can confirm that the web terminal feature works well now

S0n98 commented 1 year ago

I'm also facing an issue with web shell, got only black screen with blinking cursor, but the logs I got from argocd-server are quite different:

2022/12/21 08:03:07 http: response.WriteHeader on hijacked connection from github.com/argoproj/argo-cd/v2/server/application.(*terminalHandler).ServeHTTP (terminal.go:237)
71
2022/12/21 08:03:07 http: response.Write on hijacked connection from fmt.Fprintln (print.go:265)

Image used: linuxserver/openvpn-as:version-2.9.0-5c5bd120-Ubuntu18

Argocd :

{
    "Version": "v2.5.2+148d8da",
    "BuildDate": "2022-11-07T16:42:47Z",
    "GitCommit": "148d8da7a996f6c9f4d102fdd8e688c2ff3fd8c7",
    "GitTreeState": "clean",
    "GoVersion": "go1.18.8",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v4.5.7 2022-08-02T16:35:54Z",
    "HelmVersion": "v3.10.1+g9f88ccb",
    "KubectlVersion": "v0.24.2",
    "JsonnetVersion": "v0.18.0"
}