argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15k stars 3.19k forks source link

Wait container cannot kill distroless sidecars #5256

Closed ludydoo closed 3 years ago

ludydoo commented 3 years ago

Summary

Using istio-9.1-distroless with automatic sidecar injection. The workflow executes the main container, then tries to kill the sidecar container using exec.

https://github.com/argoproj/argo-workflows/blob/e6fa41a1b91be2e56884ca16427aaaae4558fa00/workflow/executor/k8sapi/client.go#L99

│ 2021-03-02T08:16:34.500261732Z wait time="2021-03-02T08:16:34.500Z" level=info msg="https://10.255.240.1:443/api/v1/namespaces/cicd/pods/REDACTED-tszxw-603544277/exec?command=%2Fbin% │
│ 2Fsh&command=-c&command=kill+-15+1&container=istio-proxy&stderr=true&stdout=false&tty=false"      

Since the istio-proxy container does not have a sh shell, there is a timeout error.

 time="2021-03-02T08:17:04.671Z" level=error msg="executor error: Timeout occurred\ngithub.com/argoproj/argo/v2/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com  /argoproj/argo/v2/errors.InternalWrapError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:71\ngithub.com/argoproj/argo/v2/workflow/common.GetExecutorOutput\n\t/go/src/github.com/argopr  oj/argo/workflow/common/util.go:240\ngithub.com/argoproj/argo/v2/workflow/executor/k8sapi.(*k8sAPIClient).KillContainer\n\t/go/src/github.com/argoproj/argo/workflow/executor/k8sapi/client.go  :87\ngithub.com/argoproj/argo/v2/workflow/executor/common.TerminatePodWithContainerID\n\t/go/src/github.com/argoproj/argo/workflow/executor/common/common.go:92\ngithub.com/argoproj/argo/v2/w  orkflow/executor/common.KillGracefully\n\t/go/src/github.com/argoproj/argo/workflow/executor/common/common.go:98\ngithub.com/argoproj/argo/v2/workflow/executor/k8sapi.(*k8sAPIClient).killGra  cefully\n\t/go/src/github.com/argoproj/argo/workflow/executor/k8sapi/client.go:92\ngithub.com/argoproj/argo/v2/workflow/executor/k8sapi.(*K8sAPIExecutor).Kill\n\t/go/src/github.com/argoproj/  argo/workflow/executor/k8sapi/k8sapi.go:71\ngithub.com/argoproj/argo/v2/workflow/executor.(*WorkflowExecutor).KillSidecars\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:1  268\ngithub.com/argoproj/argo/v2/cmd/argoexec/commands.waitContainer.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:33\ngithub.com/argoproj/argo/v2/cmd/argoexec/comm  ands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:87\ngithub.com/argoproj/argo/v2/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj  /argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:846\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/  go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887\nmain.main\n\t/go/src/github.co  m/argoproj/argo/cmd/argoexec/main.go:13\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"                                   time="2021-03-02T08:17:04.671Z" level=info msg="Alloc=6418 TotalAlloc=18948 Sys=70592 NumGC=6 Goroutines=9"                                                                                     stream closed                                                                                                                                                                                                                                                                                                                       

Workaround

Add a curl sidecar which will trigger the /quitquitquit istio sidecar endpoint.

      sidecars:
        - name: istio-cleanup
          image: curlimages/curl:latest
          command: ["sh", "-c"]
          args:
            - istio(){ curl -X POST --silent --fail http://localhost:15020/quitquitquit; echo Sidecar stopped; exit $?; }; trap istio SIGINT; trap istio SIGTERM; while true; do sleep 1; done

As a bonus, this is the script I use to wait for istio proxy. Sometimes either curl/wget are not available, so using netcat:

sidecarReady()
{
  set +e
  if [ -x "$(which nc)" ] ; then
    echo -n "GET /healthz/ready HTTP/1.1
Host: 127.0.0.1
Connection: close

" |
    nc -i 1 127.0.0.1 15021 |
    head -1 |
    grep "200 OK"
  elif [ -x "$(which curl)" ]; then
    curl --silent --fail  http://127.0.0.1:15021/healthz/ready
  elif [ -x "$(which wget)" ] ; then
   wget -O /dev/null --quiet http://127.0.0.1:15021/healthz/ready
  else
    echo "Could not find curl or wget, please install one." >&2
    exit 1
  fi
}

echo Waiting for sidecar
until sidecarReady
do
  echo "Waiting for Sidecar..."
  sleep 3
done
echo Sidecar ready

This is the kill sidecar using netcat (busybox wget cannot POST):


sidecarCleanup()
{
  echo Stopping sidecar
  set +e
  if [ -x "$(which nc)" ] ; then
    echo -n "POST /quitquitquit HTTP/1.1
Host: 127.0.0.1
Connection: close
Content-length: 0

" |
    nc -i 1 127.0.0.1 15020 |
    head -1 |
    grep "200 OK"
  elif [ -x "$(which curl)" ]; then
    curl -X POST --silent --fail http://localhost:15020/quitquitquit
    echo Sidecar stopped
  elif [ -x "$(which wget)" ] ; then
    wget -O /dev/null --quiet --method=POST http://localhost:15020/quitquitquit
    echo Sidecar stopped
  else
    echo "Could not find curl or wget, please install one." >&2
  fi
  exit $?
}

trap sidecarCleanup SIGTERM
trap sidecarCleanup SIGKILL

Workaround drawbacks

Seems that argo wait container does not recognize that the sidecar was killed by /quitquitquit, and waits for the timeout anyways, which adds additional delays.

Diagnostics

What Kubernetes provider are you using?

1.20.2

What version of Argo Workflows are you running?

2.12.9

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

alexec commented 3 years ago

Nice to see distroless in the wild. You cannot kill sidecars with either the k8sapi or kubelet executors. It is not possible to fix. Instead, use PNS, Docker or Emissary executors.

ludydoo commented 3 years ago

Hi @alexec

The /quitquitquit endpoint works well. Though, it introduces additional delay for each task.

Perhaps a possible improvement would be that

func (c *k8sAPIClient) KillContainer(pod *corev1.Pod, container *corev1.ContainerStatus, sig syscall.Signal) error {

Could poll the sidecar container status to see if it was already killed, instead of waiting for the command to return an error. But that's probably not a priority

nkitajim commented 3 years ago

Workround1

Pattern1. mount hostpath busybox

   sidecars:
    - image: pause
      name: pause
      volumeMounts:
      - mountPath: /bin
        name: busybox
  volumes:
    - hostPath:
        path: /opt/busybox
        type: DirectoryOrCreate
      name: busybox

Pattern2. initContainer copy busybox

    volumes:
    - name: busybox
      emptyDir: {}
    initContainers:
    - image: busybox
      name: busybox
      command:
        - sh
        - -c
        - |
          cp /bin/busybox /opt/busybox/
          cd /opt/busybox
          busybox --list | xargs -I{} ln -s busybox {}
      volumeMounts:
      - mountPath: /opt/busybox
        name: busybox

Workaround1 drawbacks

distroless container security

Workaround2

EphemeralContainers

kill $(pgrep <sidecar command>)

Workaround1 drawbacks

slow

alexec commented 3 years ago

I think we need something better here.

Can you confirm if ISTIO contains a /bin/kill file?

no-response[bot] commented 3 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.