elixir-cloud-aai / tesk-core

Python code that is launched as images into the Kubernetes cluster by tesk-api.
Apache License 2.0
2 stars 13 forks source link

Netrc support is broken for OpenShift #31

Closed lvarin closed 4 years ago

lvarin commented 4 years ago

Hello,

The core of the problem is this:

  Warning  FailedPostStartHook  17m   kubelet, c03-ssdnode-2  Exec lifecycle hook ([/bin/sh -c cp /tmp/user/.netrc $HOME]) for Container "filer" in Pod "task-722427ff-outputs-filer-ff6tv_csc-tesk(d2baf5d3-e5e5-11ea-baa6-fa163e564e82)" failed - error: command '/bin/sh -c cp /tmp/user/.netrc $HOME' exited with 1: cp: can't create '/.netrc': Permission denied
, message: "cp: can't create '/.netrc': Permission denied\n"

Only root can write to /, and OpenShift does not run the containers as root.

The log of the pod that fails:

$ oc logs task-722427ff-outputs-filer-ff6tv                                
08/24/2020 08:43:06 ERROR: [Errno 2] No such file or directory: '/.netrc'
Traceback (most recent call last):
  File "/usr/bin/filer", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python3.7/site-packages/tesk_core/filer.py", line 545, in main
    if process_file(args.transputtype, afile):
  File "/usr/lib/python3.7/site-packages/tesk_core/filer.py", line 498, in process_file
    with trans(filedata['path'], filedata['url'], Type(filedata['type'])) as transfer:
  File "/usr/lib/python3.7/site-packages/tesk_core/filer.py", line 219, in __enter__
    ftp_login(self.ftp_connection, self.netloc, self.netrc_file)
  File "/usr/lib/python3.7/site-packages/tesk_core/filer.py", line 323, in ftp_login
    ftp_connection.login()
  File "/usr/lib/python3.7/ftplib.py", line 420, in login
    resp = self.sendcmd('PASS ' + passwd)
  File "/usr/lib/python3.7/ftplib.py", line 273, in sendcmd
    return self.getresp()
  File "/usr/lib/python3.7/ftplib.py", line 246, in getresp
    raise error_perm(resp)
ftplib.error_perm: 530 Login authentication failed

and the describe of the pod:

$ oc describe pod task-722427ff-outputs-filer-ff6tv                                   
Name:               task-722427ff-outputs-filer-ff6tv
Namespace:          csc-tesk
Priority:           0
PriorityClassName:  <none>
Node:               c03-ssdnode-2/192.168.6.2
Start Time:         Mon, 24 Aug 2020 11:43:03 +0300
Labels:             controller-uid=c090f54c-e5e5-11ea-8de7-fa163e9334d8
                    job-name=task-722427ff-outputs-filer
Annotations:        kubernetes.io/limit-ranger=LimitRanger plugin set: cpu, memory request for container filer; cpu, memory limit for container filer
                    openshift.io/scc=restricted
Status:             Failed
IP:                 10.130.4.4
Controlled By:      Job/task-722427ff-outputs-filer
Containers:
  filer:
    Container ID:  docker://ba00c972522f7008ec47bd05f310965e54bcc05ce4580f75a9c2946fe4444b35
    Image:         eu.gcr.io/tes-wes/filer:v0.8.3
    Image ID:      docker-pullable://eu.gcr.io/tes-wes/filer@sha256:8ac6cd44db8345581fa4ddd3f289a0ccf734f838fcb5ab631e664d2ec7ca0391
    Port:          <none>
    Host Port:     <none>
    Args:
      outputs
      $(JSON_INPUT)
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 24 Aug 2020 11:43:06 +0300
      Finished:     Mon, 24 Aug 2020 11:43:12 +0300
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     16
      memory:  64Gi
    Requests:
      cpu:     200m
      memory:  200Mi
    Environment:
      JSON_INPUT:           {"outputs": [{"name": "stdout", "url": "ftp://vm1976.kaj.pouta.csc.fi/dc7fa8bc-d396-454d-804f-f425580b015b/output_9389f073-5e6e-4f5d-a122-631e7993832e/whirlpool", "path": "/aOzfcH/whirlpool", "type": "FILE"}, {"name": "workdir", "url": "ftp://vm1976.kaj.pouta.csc.fi/dc7fa8bc-d396-454d-804f-f425580b015b/output_9389f073-5e6e-4f5d-a122-631e7993832e/", "path": "/aOzfcH", "type": "DIRECTORY"}], "inputs": [{"name": "input", "description": "cwl_input:input", "url": "https://github.com/uniqueg/cwltool/blob/master/requirements.txt", "path": "/var/lib/cwl/stg5a0a7974-a142-4118-bf77-f9e43afcfafd/requirements.txt", "type": "FILE"}], "volumes": [], "executors": [{"apiVersion": "batch/v1", "kind": "Job", "metadata": {"annotations": {"tes-task-name": "whirlpool"}, "labels": {"job-type": "executor", "taskmaster-name": "task-722427ff", "executor-no": "0", "creator-user-id": "6d8af480e6175672caacd41cecda822bdc403d61_elixir-europe.org"}, "name": "task-722427ff-ex-00"}, "spec": {"template": {"metadata": {"name": "task-722427ff-ex-00"}, "spec": {"containers": [{"command": ["/bin/sh", "-c", "openssl dgst -sha512 /var/lib/cwl/stg5a0a7974-a142-4118-bf77-f9e43afcfafd/requirements.txt > /aOzfcH/whirlpool"], "env": [{"name": "HOME", "value": "/aOzfcH"}, {"name": "TMPDIR", "value": "/tmp"}], "image": "kubler/openssl:20190330", "name": "task-722427ff-ex-00", "resources": {"requests": {"memory": "1.074G", "cpu": "1"}}, "workingDir": "/aOzfcH"}], "restartPolicy": "Never"}}}}], "resources": {"disk_gb": 2.1474843604837712}}
      HOST_BASE_PATH:       
      CONTAINER_BASE_PATH:  
    Mounts:
      /aOzfcH from task-volume (rw)
      /tmp/user/.netrc from netrc (rw)
      /var/lib/cwl/stg5a0a7974-a142-4118-bf77-f9e43afcfafd from task-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xk7kk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  task-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  task-722427ff-pvc
    ReadOnly:   false
  netrc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  netrc
    Optional:    false
  default-token-xk7kk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-xk7kk
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  default_run=allow
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason               Age   From                    Message
  ----     ------               ----  ----                    -------
  Normal   Scheduled            17m   default-scheduler       Successfully assigned csc-tesk/task-722427ff-outputs-filer-ff6tv to c03-ssdnode-2
  Normal   Pulled               17m   kubelet, c03-ssdnode-2  Container image "eu.gcr.io/tes-wes/filer:v0.8.3" already present on machine
  Normal   Created              17m   kubelet, c03-ssdnode-2  Created container
  Normal   Started              17m   kubelet, c03-ssdnode-2  Started container
  Warning  FailedPostStartHook  17m   kubelet, c03-ssdnode-2  Exec lifecycle hook ([/bin/sh -c cp /tmp/user/.netrc $HOME]) for Container "filer" in Pod "task-722427ff-outputs-filer-ff6tv_csc-tesk(d2baf5d3-e5e5-11ea-baa6-fa163e564e82)" failed - error: command '/bin/sh -c cp /tmp/user/.netrc $HOME' exited with 1: cp: can't create '/.netrc': Permission denied
, message: "cp: can't create '/.netrc': Permission denied\n"
  Normal  Killing  17m  kubelet, c03-ssdnode-2  Killing container with id docker://filer:FailedPostStartHook
aniewielska commented 4 years ago

Right. I wondered, how SCC that Openshift applies affect the HOME directory. It seems that it is / the root directory in your case for all containers and that is why your PR worked on OpenShift (in my case it is /home/taskmaster and /root respectively). I wonder, if that is a rule for runAsUser that it moves home to /. If so, I can just mount the file there and keep the hook for no SCCs. Or move the HOME elsewhere (I wanted to avoid it, but it might be the best solution).

lvarin commented 4 years ago

Given that $HOME changes between platforms, what if netrc is mounted in /etc/netrc instead? And the code at filer.py just looks for it in /etc/netrc.

Or something like this?

aniewielska commented 4 years ago

That is possible and will work for ftplib, because we specifically pass the location to the call. There is also an env var (NETRC or similar) that we could additionally use to point to that arbitrary location and that would work for some additional tools than ftplib. Unfortunately, some tools (as Python requests library) only seem to check HOME for .netrc.

I think I will go with that solution anyway.

aniewielska commented 4 years ago

And some tests of where the HOME is:

filer on K8s -->  /root
taskmaster on K8s --> /home/taskmaster
filer with runAsUser  --> / 
taskmaster with runAsUser  --> / 
filer with home env set in pod descriptor to /etc/netrc -->  /etc/netrc
taskmaster with home env set in pod descriptor to /etc/netrc -->  /etc/netrc
filer with runAsUser and home env set in pod descriptor to /etc/netrc -->  /etc/netrc
taskmaster with runAsUser and home env set in pod descriptor to /etc/netrc -->  /etc/netrc

When runAsUser is set the entire filesystem becomes read-only (I guess minus the mounts, but have not checked)

cibinsb commented 4 years ago

I'm not an expert, but if we configure the security context for a pod its possible to execute the commands with root privileges.

aniewielska commented 4 years ago

@cibinsb the problem is that the security context has been already defined (runAsUser, group, fsGroup and similar) on OpenShift and we don't want to run as root on OpenShift, unless really necessary. But also moving the filer to a different user does not help, as the runAsUser still moves HOME. Setting HOME seams to work though - have a look here: https://github.com/EMBL-EBI-TSI/tesk-core/pull/32 I am waiting for @lvarin to confirm that this time it works for him as well.