apptainer / singularity

Singularity has been renamed to Apptainer as part of us moving the project to the Linux Foundation. This repo has been persisted as a snapshot right before the changes.
https://github.com/apptainer/apptainer
Other
2.53k stars 424 forks source link

Runscript doesn't work when container called in a script #3728

Closed annaprins closed 4 years ago

annaprins commented 5 years ago

Version of Singularity:

3.1.1

Expected behavior

I am trying to run a container from within a bash script. This script is submitted to a job scheduler (Slurm). It should, in theory, work the same as it would in the command line.

Actual behavior

I am getting this error when I run my script:

/.singularity.d/runscript: line 3: /usr/local/g16/bsd/g16.profile: Permission denied

Steps to reproduce behavior

In my script, I have the line:

singularity run $container_path g16 < $jobname.inp > $jobname.out

(The variables are defined above in the script.)

The command isn't necessarily that important to know, because the problem is not there. I've included it just in case. This is what my definition file looks like:

Bootstrap: docker
From: centos:7

%files
        /home/software/g16-B.01-x86_64_legacy-E6L-103X.tar.bz2 /usr/local/g16.tar.bz2

%environment
        export g16root=/usr/local
        export GAUSS_SCRDIR=/tmp

%post
        yum install -y bzip2
        cd /etc
        cp -p group group.bak
        cp -p gshadow gshadow.bak
        groupadd -g 499 gaussian
        cd /usr/local
        tar xvf g16.tar.bz2
        chown -R :gaussian g16
        chmod -R o-rwx g16

%runscript
        . /usr/local/g16/bsd/g16.profile
        exec $@

As I mentioned before, when I type the same command in the command line directly, the container runs as intended. However, in the script it won't.

jscook2345 commented 5 years ago

What are the permissions for the file /usr/local/g16/bsd/g16.profile? Maybe do an ls -al /usr/local/g16/bsd in your %post and rebuild the container?

annaprins commented 5 years ago

@jscook2345 When I ran an ls -al in the %post I got this:

-rwxr-x--- 1 350 gaussian 2684 Dec 28 2017 /usr/local/g16/bsd/g16.profile

annaprins commented 5 years ago

I've been digging deeper over the weekend and I've discovered something else that is strange. I have three users in my test environment. Two of them are sudoers and one is a regular user. They are all a part of the gaussian group. They are my admin user (sudo, gaussian), aprons (gaussian), and WebMO (sudo, gaussian). However, my web user can't run the container in a script which the others can. I have no idea what the difference between my admin user and the webmo user is, as the admin user has no problem running singularity containers in a script. Here is the Slurm script I've been using to submit jobs that run containers:

#!/bin/sh
#SBATCH --job-name=hockerg16
#SBATCH --output=output.txt
#SBATCH -e errors.txt

singularity run /home/singularity_containers/gaussian16_centos7.sif g16 < test0001.com > test0001.log.linux

aprins and admin can run this, but webmo is getting the error. It's worth noting that webmo can run the same singularity run command that is in the script on the command line without issue.

jmstover commented 5 years ago

I believe that only the primary group is what is added to the container... try doing:

newgrp gaussian
sbatch [jobs script]

As the webmo user

Edit: I'm wrong here... 3.1.1 should have the secondary groups included. I guess what would be useful to see in this case would be the output as the webmo user of:

singularity exec /home/singularity_containers/gaussian16_centos7.sif id

If you can run this in the same way you're running the previous command that would be best.

annaprins commented 5 years ago

I changed the primary group to Gaussian and it worked, which is odd because I'm definitely running version 3.1.1. Not sure if I should mark this issue as solved or not, as my problem is fixed but not having secondary groups included in 3.1.1 is not a good thing/possible bug.

jmstover commented 5 years ago

Hi @annaprins,

3.1.1 does include secondary groups. At least my build from the tag does. The only thing I'm wondering is if the slurm execution is only catching the primary group initially. So, it's calling singularity with only the UID/Primary GID ...

-J

annaprins commented 5 years ago

I've done some more testing, and it seems that gaussian is not the primary group of my aprins user and my aprins user is able to run jobs just fine. For some reason, my webmo user needs to have gaussian as the primary group in order to run the same jobs. Any idea what's going on here?

rherban commented 5 years ago

Can I ask why you're adding the gaussian group directly to the container? Singularity will bring your groups into the container automatically. I don't think that's causing the problem here, but wanted to ask.

Can you add a call to id inside your slurm submit script and inside the container runscript as well?

dtrudg commented 4 years ago

Closing as stale without follow-up