NERSC / podman-hpc

Other
34 stars 5 forks source link

SSH authentication during build #93

Closed vhewes closed 8 months ago

vhewes commented 8 months ago

hi! i'm trying to build an image on Perlmutter from a Containerfile, and the process includes cloning a private SSH repository. i'm running into issues using the provided mechanism to expose SSH keys during the build stage – in my Containerfile, i'm using the syntax RUN --mount=type=ssh. i have an SSH agent in my interactive session, and i'm trying to forward that by passing --ssh default to podman-hpc build, but my build fails with the following error:

time="2023-10-25T15:15:19-07:00" level=error msg="runc create failed: unable to start container process: error during container init: error mounting \"/tmp/buildah1301058818/mnt/buildah-bind-target-11\" to rootfs at \"/run/buildkit/ssh_agent.0\": mount /tmp/buildah1301058818/mnt/buildah-bind-target-11:/run/buildkit/ssh_agent.0 (via /proc/self/fd/8), flags: 0x1021: operation not permitted"
: exit status 1

as far as i can tell from the documentation, i think i should be able to forward my SSH agent this way. am i doing something wrong, or is this an issue on the podman-hpc side? any help would be greatly appreciated!

lastephey commented 8 months ago

HI @vhewes,

Thanks for the issue. Since you're a NERSC user, it might make more sense to handle this via our help.nersc.gov ticket system. If you submit a ticket, it will come to us (the podman-hpc development team). We can update this issue with any more widely useful info depending on the outcome.

Can you please open a ticket at NERSC via help.nersc.gov and include the steps we need to reproduce your error (location of your Containerfile, syntax of build command, etc)?

Thank you, Laurie

vhewes commented 8 months ago

thank you Laurie! you can find the ticket at INC0211776

lastephey commented 8 months ago

For others who may come across this issue, the problem appears to be a result of using runc rather than crun. Switching to crun on our test system fixes the error. Here's my test which clones a private repo that I can access.

On our system with runc:

stephey@perlmutter:login31:/pscratch/sd/s/stephey/containerfiles/ssh> ssh-add ~/.ssh/id_ed25519 
Identity added: /global/homes/s/stephey/.ssh/id_ed25519 (laurie.stephey@gmail.com)
stephey@perlmutter:login31:/pscratch/sd/s/stephey/containerfiles/ssh> cat Containerfile
FROM python:3.8.1

RUN mkdir -m 0600 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts

RUN --mount=type=ssh git clone --recursive -b unstable git@github.com:gafusion/OMFIT-source.git 
stephey@perlmutter:login31:/pscratch/sd/s/stephey/containerfiles/ssh> podman-hpc build -t ssh:test --ssh=default .
STEP 1/4: FROM python:3.8.1
Resolved "python" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/python:3.8.1...
Getting image source signatures
Copying blob dc65f448a2e2 done  
Copying blob 9253bd2ee3f6 done  
Copying blob 8ac92ddf84b3 done  
Copying blob a3ca60abc08a done  
Copying blob dea4ecac934f done  
Copying blob 346ffb2b67d7 done  
Copying blob fad96c8dce44 done  
Copying blob ec0f51d2752d done  
Copying blob 1fa0065c6287 done  
Copying config efdecc2e37 done  
Writing manifest to image destination
Storing signatures
STEP 2/4: ENV "PODMANHPC_MODULES_DIR"="/etc/podman_hpc/modules.d"
--> 55651000d4a
STEP 3/4: RUN mkdir -m 0600 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
# github.com:22 SSH-2.0-babeld-f8b1fc6c
# github.com:22 SSH-2.0-babeld-f8b1fc6c
# github.com:22 SSH-2.0-babeld-f8b1fc6c
--> 2d3d5557b91
STEP 4/4: RUN --mount=type=ssh git clone --recursive -b unstable git@github.com:gafusion/OMFIT-source.git 
error running container: from /usr/bin/runc creating container for [/bin/sh -c git clone --recursive -b unstable git@github.com:gafusion/OMFIT-source.git]: time="2023-11-02T16:07:47-07:00" level=error msg="runc create failed: unable to start container process: error during container init: error mounting \"/tmp/buildah3639458703/mnt/buildah-bind-target-10\" to rootfs at \"/run/buildkit/ssh_agent.0\": mount /tmp/buildah3639458703/mnt/buildah-bind-target-10:/run/buildkit/ssh_agent.0 (via /proc/self/fd/8), flags: 0x1021: operation not permitted"
: exit status 1
Error: building at STEP "RUN --mount=type=ssh git clone --recursive -b unstable git@github.com:gafusion/OMFIT-source.git": while running runtime: exit status 1

Onr our system with crun:

stephey@muller:login01:/mscratch/sd/s/stephey/containerfiles/ssh> ssh-add ~/.ssh/id_ed25519 
Identity added: /global/homes/s/stephey/.ssh/id_ed25519 (laurie.stephey@gmail.com)
stephey@muller:login01:/mscratch/sd/s/stephey/containerfiles/ssh> cat Containerfile 
FROM python:3.8.1

RUN mkdir -m 0600 ~/.ssh  && ssh-keyscan github.com >> ~/.ssh/known_hosts

RUN --mount=type=ssh git clone --recursive -b unstable git@github.com:gafusion/OMFIT-source.git 
stephey@muller:login01:/mscratch/sd/s/stephey/containerfiles/ssh> podman-hpc build -t ssh:test --ssh=default .
STEP 1/4: FROM python:3.8.1
STEP 2/4: ENV "PODMANHPC_MODULES_DIR"="/etc/podman_hpc/modules.d"
--> Using cache 819d6a9e785dd91c39b29c48858bf37209b8febf46d810c61fb01daa2f848f9c
--> 819d6a9e785
STEP 3/4: RUN mkdir -m 0600 ~/.ssh  && ssh-keyscan github.com >> ~/.ssh/known_hosts
--> Using cache b58ceb6ecb9a12e2c3d23c8fa2797fdc16968c34f28c60e19c1dbcffab07af0a
--> b58ceb6ecb9
STEP 4/4: RUN --mount=type=ssh git clone --recursive -b unstable git@github.com:gafusion/OMFIT-source.git 
--> Using cache 28dab9a89c6be93d25e28f917e22dee7fb278234328a44d91fd3fe5d90e526ce
COMMIT ssh:test
--> 28dab9a89c6
Successfully tagged localhost/ssh:test
28dab9a89c6be93d25e28f917e22dee7fb278234328a44d91fd3fe5d90e526ce
stephey@muller:login01:/mscratch/sd/s/stephey/containerfiles/ssh> 

I'm not sure if this is expected or not, but since we're already planning to switch to crun, I don't think we'll plan go dig into this.

I'll go ahead and close for now- we can reopen if needed.