glideinWMS / containers

GlideinWMS containers
Apache License 2.0
0 stars 12 forks source link

ssh-agent not working in fnal-wn-sl7 and fnal-wn-sl7-dev #28

Closed mambelli closed 2 months ago

mambelli commented 2 months ago

The following was reported by @vitodb via Slack Tom tested the SL7 dev container on SBND nodes. Using the DDT debugger distributed through forge_tools UPS package, he is getting an error related to ssh-agent permissions. In the container distributed via CVMFS, permission for this executable are

Apptainer> ls -lh /usr/bin/ssh-agent
-rwx--x--x 1 65534 65534 374K Aug  1  2023 /usr/bin/ssh-agent

this is causing permissions issues to execute the command. On the host node it has permissions

[vito@sbndbuild03 ~]$ ls -lh /usr/bin/ssh-agent
-rwxr-xr-x 1 root root 281K Mar  5 16:34 /usr/bin/ssh-agent

this is working fine. Another check I have done is on the local container in /exp/sbnd/data/... it got permissions

Apptainer> ls -lh /usr/bin/ssh-agent
---x--s--x 1 vito sbnd 374K Aug  1  2023 /usr/bin/ssh-agent 

this is also working, it looks like has the setgid bit on, so this seems to allow it to work

mambelli commented 2 months ago

I checked on a stock SL7, the file is in openssh-clients and the permissions ha SGID, appatrently to avoid a strace vulnerability

[root@1a401ba6d143 /]# yum install openssh-clients
...
[root@1a401ba6d143 /]# ls -al /usr/bin/ssh-agent
---x--s--x 1 root nobody 382208 Aug  1  2023 /usr/bin/ssh-agent
[root@1a401ba6d143 /]# ssh-agent
SSH_AUTH_SOCK=/tmp/ssh-DCBiHmjYcJFy/agent.52; export SSH_AUTH_SOCK;
SSH_AGENT_PID=53; export SSH_AGENT_PID;
echo Agent pid 53;

The permission on other RHEL versions is different.

mambelli commented 2 months ago

The fnal-dev-sl7 container that we build has SGID set and works fine (I pulled it off Docker Hub)

[root@fermicloud826 ~]# podman run -it docker.io/fermilab/fnal-dev-sl7:latest /bin/bash
[root@d803b189831e /]# ls -al /usr/bin/ssh-agent
---x--s--x 1 root nobody 382208 Aug  1  2023 /usr/bin/ssh-agent
[root@d803b189831e /]# ssh-agent
SSH_AUTH_SOCK=/tmp/ssh-jdMGuDfFFO41/agent.21; export SSH_AUTH_SOCK;
SSH_AGENT_PID=22; export SSH_AGENT_PID;
echo Agent pid 22;
[root@d803b189831e /]#

There may be some changes in the process creating the Apptainer image (SIF) or expanding in CVMFS. It may remove SGID permissions. Will have to check with @DrDaveD or OSG

DrDaveD commented 2 months ago

Apptainer has the No New Priviileges kernel feature set. Even if the SGID bit is set, it is not used.

mambelli commented 2 months ago

Vito made on Ceph a dump of the container with apptainer build --sandbox and the resulting container works, possibly is a CVMFS feature that could need to handle permission in a specific way

vitodb commented 2 months ago

When starting the apptainer container from CVMFS, for example using /cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/bin/apptainer exec /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-dev-sl7:latest /bin/bash and an application, or user, tries to run ssh-agent it report permission denied:

$ /cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/bin/apptainer exec /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-dev-sl7:latest /bin/bash
Apptainer> ssh-agent 
bash: /usr/bin/ssh-agent: Permission denied
Apptainer> ls -lh /usr/bin/ssh-agent
-rwx--x--x 1 65534 65534 374K Aug  1  2023 /usr/bin/ssh-agent
Apptainer> 

This seems to happen with all SL7 container I tested from CVMFS. While, as mentioned in te previous post, if the container is dumped on Ceph volume it has permissions that allow users to use it,

On the other side, testing EL8/EL9 or even SL6 container, ssh-agent works:

$ /cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/bin/apptainer exec /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-wn-el9:latest /bin/bash
Apptainer> ls -lh /usr/bin/ssh-agent
-rwxr-xr-x 1 nobody nobody 281K Mar  5 16:34 /usr/bin/ssh-agent
Apptainer> ssh-agent 
SSH_AUTH_SOCK=/tmp/ssh-XXXXXXl3DkwY/agent.167678; export SSH_AUTH_SOCK;
SSH_AGENT_PID=167687; export SSH_AGENT_PID;
echo Agent pid 167687;
Apptainer> 

somehow this seems to be an issue with the combination SL7 contained deployed on CVMFS.

DrDaveD commented 2 months ago

I'm skeptical that it works with the SL7 container inside apptainer from a sandbox. It didn't work for me. Can you please double-check that @vitodb ?

I did

$ apptainer build --sandbox /scratch/tmp/fnal-dev-sl7 docker://fermilab/fnal-dev-sl7
...
$ apptainer exec /scratch/tmp/fnal-dev-sl7 /bin/bash
Apptainer> ls -l /usr/bin/ssh-agent
---x--s--x 1 dwd fnalgrid 382208 Aug  1  2023 /usr/bin/ssh-agent
Apptainer> ssh-agent
mkdtemp: private socket dir: No such file or directory

On the other hand, if I change those permissions to 711 and execute outside of apptainer it works. So I'm not exactly sure why the creation of a private socket dir is failing.

DrDaveD commented 2 months ago

It also seems like not a good security model to run ssh-agent inside an apptainer container. Maybe the user has an alternative. Normally ssh-agent is run on a desktop or laptop and forwarded everywhere through ssh.

DrDaveD commented 2 months ago

Ok I have done more investigation. The error I was seeing was because I had set TMPDIR=/scratch/tmp but I didn't bind that in from the host. If I add -B /scratch then I see the same symptoms as Vito.

The problem is not a missing SGID bit, however. ssh-agent works fine without it; I don't know what it is for. It is something about that being mode 711 while not being the owner of the file. If I build my own sandbox I am the owner so it doesn't matter. If I change the owner and group of the file in my sandbox to someone else, it also fails with Permission denied. If I then change the mode to 755, it works ok. So a workaround is to do chmod 755 /usr/bin/ssh-agent in the Dockerfile of that container.