Closed soichih closed 4 years ago
Hello. We are still stuck with this problem. Is singularity v3 supported on CRAY system? Is there any other information that I can provide to help troubleshoot this problem?
I did notice one thing..
The env that goes missing are stored in /.singularity.d/env/10-docker2singularity.sh, and this is loaded by /.singularity.d/actions/exec or /.singularity.d/actions/run, etc..
#!/bin/sh
for script in /.singularity.d/env/*.sh; do
if [ -f "$script" ]; then
. "$script"
fi
done
When I run singularity via srun, I see the following warning.
hayashis@bigred3(elogin2):~(disabled) $ srun singularity exec -e docker://brainlife/freesurfer_on_mcr:6.0.2 ls /.singularity.d/actions
..
WARNING: container does not have /.singularity.d/actions/exec, calling ls directly
/bin/ls: cannot access '/.singularity.d/actions': Srmount error
..
So, it looks like /.singularity.d/actions directory somehow goes missing when I run singularity via srun, and this causes it to not load our /.singularity.d/env scripts.
My question is, why does /.singularity.d/actions go missing if I run it on our cluster CE via srun?
The /.singularity.d/actions
directory is bind mounted in from ${sysconfdir}/singularity/actions
. Does that directory exist on all run nodes?
That's showing a mount error
when you're trying to do a ls
... what does the SYSCONFDIR location look like on the host?
eval $(singularity buildcfg | grep ^SYSCONFDIR)
df -h $SYSCONFDIR
Others may have other ideas as well ...
@cclerget @dctrud
So ... is there any way you can think of using just --prefix
that you can build singularity, but not have the ${sysconfdir}/singularity/actions
on every node if installed to a shared location. I mean, if you explicitly set --sysconfdir
to a different value, but then in this case we'd not be running because the singularity.conf
may not exist.
@soichih - If this is a CLE6 environment, then a patch on the release-3.5
branch (not yet in a versioned release) would allow you to try the 3.5 series - https://github.com/sylabs/singularity/pull/4880 instead of 3.4.2 which is not supported now.
We really need to see the output of srun singularity -d exec ....
here. The -d
will give us debug output that will hopefully show more what is going wrong. As suggested above, output of singularity buildcfg
and a listing of the SYSCONFDIR would be useful too.
@jmstover - I suspect something is not being bind mounted properly, rather than missing outright. If this is the same system as the other CLE 6 bug reports we have GPFS filesystems, and the Cray overlay stuff in play.
@dctrud @jmstover The actions directory bind looks OK otherwise singularity would fail due to the missing bind source, between this behavior is reproducible by removing execute permission from exec :
chmod 444 $(singularity buildcfg|grep SYSCONFDIR|cut -d "=" -f2)/singularity/actions/exec
singularity exec docker://alpine id
EDIT: my bad, actions
bind is ignored if it doesn't exist, but it appears the bind mount is OK because most images have a /.singularity.d/actions
with default scripts, and it looks like from srmount error
that it was mounted but further access to files in this directory end with srmount error
:
after private discussion with a user on slack, Srmount error
seems to appear when singularity is installed on a DVS (Cray Data Virtualization Service) mount point, DVS looks pretty similar to 9p and redirect posix VFS calls in linux kernel to a server (or multiple servers) serving the underlying filesystem.
@jmstover Here is what I am seeing.
$ singularity buildcfg | grep ^SYSCONFDIR
SYSCONFDIR=/N/soft/cle6/singularity/3.4.2//etc
$ df -h /N/soft/cle6/singularity/3.4.2//etc
Filesystem Size Used Avail Use% Mounted on
g2-soft 13T 8.9T 3.5T 72% /geode2/soft
The actions directory does exist on run nodes.
$ srun ls /N/soft/cle6/singularity/3.4.2/etc/singularity/actions
srun: job 146972 queued and waiting for resources
srun: job 146972 has been allocated resources
exec
run
shell
start
test
.. but it's not getting properly mounted for some reason.
@dctrud
I believe we are running on CLE6 (guessing it from "/N/soft/cle6").. and /N/soft is on GPFS.
hayashis@bigred3(elogin1):/N(disabled) $ ls -lrt
..
lrwxrwxrwx 1 root root 17 Jan 6 09:04 soft -> /geode2/soft/hps/
...
$ df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
...
g2-soft gpfs 13170548736 9455104000 3715444736 72% /geode2/soft
Here is the output with -d exec option.
$ singularity exec -e docker://busybox ls /.singularity.d/actions
exec run shell start test
$ srun singularity -d exec -e docker://busybox ls /.singularity.d/actions
srun: job 146976 queued and waiting for resources
srun: job 146976 has been allocated resources
DEBUG [U=740536,P=41075] createConfDir() /N/u/hayashis/BigRed3/.singularity already exists. Not creating.
DEBUG [U=740536,P=41075] updateCacheSubdir() Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/library
DEBUG [U=740536,P=41075] updateCacheSubdir() Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci-tmp
DEBUG [U=740536,P=41075] updateCacheSubdir() Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci
DEBUG [U=740536,P=41075] updateCacheSubdir() Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/net
DEBUG [U=740536,P=41075] updateCacheSubdir() Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/shub
DEBUG [U=740536,P=41075] updateCacheSubdir() Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oras
DEBUG [U=740536,P=41075] parseURI() Parsing docker://busybox into reference
DEBUG [U=740536,P=41075] updateCacheSubdir() Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci-tmp/6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
DEBUG [U=740536,P=41075] updateCacheSubdir() Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci-tmp/6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
DEBUG [U=740536,P=41075] execStarter() Use starter binary /N/soft/cle6/singularity/3.4.2/libexec/singularity/bin/starter-suid
DEBUG [U=740536,P=41075] execStarter() Checking for encrypted system partition
DEBUG [U=740536,P=41075] Init() Image format detection
DEBUG [U=740536,P=41075] Init() Check for sandbox image format
DEBUG [U=740536,P=41075] Init() sandbox format initializer returned: not a directory image
DEBUG [U=740536,P=41075] Init() Check for sif image format
DEBUG [U=740536,P=41075] Init() sif image format detected
VERBOSE [U=740536,P=41075] SetContainerEnv() Not forwarding SINGULARITY_CACHEDIR from user to container environment
VERBOSE [U=740536,P=41075] SetContainerEnv() HOME = /N/u/hayashis/BigRed3
VERBOSE [U=0,P=41075] print() Set messagelevel to: 5
VERBOSE [U=0,P=41075] init() Starter initialization
DEBUG [U=0,P=41075] get_pipe_exec_fd() PIPE_EXEC_FD value: 8
VERBOSE [U=0,P=41075] is_suid() Check if we are running as setuid
VERBOSE [U=0,P=41075] priv_drop() Drop root privileges
DEBUG [U=740536,P=41075] init() Read engine configuration
DEBUG [U=740536,P=41075] init() Wait completion of stage1
VERBOSE [U=740536,P=41086] priv_drop() Drop root privileges permanently
DEBUG [U=740536,P=41086] set_parent_death_signal() Set parent death signal to 9
VERBOSE [U=740536,P=41086] init() Spawn stage 1
DEBUG [U=740536,P=41086] startup() singularity runtime engine selected
VERBOSE [U=740536,P=41086] startup() Execute stage 1
DEBUG [U=740536,P=41086] StageOne() Entering stage 1
DEBUG [U=740536,P=41086] prepareFd() Open file descriptor for /N/home
DEBUG [U=740536,P=41086] prepareFd() Open file descriptor for /N/u
DEBUG [U=740536,P=41086] prepareFd() Open file descriptor for /N/dc2
DEBUG [U=740536,P=41086] prepareFd() Open file descriptor for /N/soft
DEBUG [U=740536,P=41086] prepareFd() Open file descriptor for /N/dcwan
DEBUG [U=740536,P=41086] prepareFd() Open file descriptor for /N/slate
DEBUG [U=740536,P=41086] prepareFd() Open file descriptor for /N/project
DEBUG [U=740536,P=41086] Init() Image format detection
DEBUG [U=740536,P=41086] Init() Check for sandbox image format
DEBUG [U=740536,P=41086] Init() sandbox format initializer returned: not a directory image
DEBUG [U=740536,P=41086] Init() Check for sif image format
DEBUG [U=740536,P=41086] Init() sif image format detected
VERBOSE [U=740536,P=41075] wait_child() stage 1 exited with status 0
DEBUG [U=740536,P=41075] cleanup_fd() Close file descriptor 4
DEBUG [U=740536,P=41075] cleanup_fd() Close file descriptor 6
DEBUG [U=740536,P=41075] cleanup_fd() Close file descriptor 7
DEBUG [U=740536,P=41075] init() Set child signal mask
DEBUG [U=740536,P=41075] init() Create socketpair for master communication channel
DEBUG [U=740536,P=41075] init() Create RPC socketpair for communication between stage 2 and RPC server
VERBOSE [U=740536,P=41075] priv_escalate() Get root privileges
VERBOSE [U=0,P=41075] priv_escalate() Change filesystem uid to 740536
VERBOSE [U=0,P=41075] init() Spawn master process
DEBUG [U=0,P=41092] set_parent_death_signal() Set parent death signal to 9
VERBOSE [U=0,P=41092] create_namespace() Create mount namespace
VERBOSE [U=0,P=41075] enter_namespace() Entering in mount namespace
DEBUG [U=0,P=41075] enter_namespace() Opening namespace file ns/mnt
VERBOSE [U=0,P=41092] create_namespace() Create mount namespace
DEBUG [U=0,P=41093] set_parent_death_signal() Set parent death signal to 9
VERBOSE [U=0,P=41093] init() Spawn RPC server
DEBUG [U=740536,P=41075] startup() singularity runtime engine selected
DEBUG [U=0,P=41093] startup() singularity runtime engine selected
VERBOSE [U=740536,P=41075] startup() Execute master process
VERBOSE [U=0,P=41093] startup() Serve RPC requests
DEBUG [U=740536,P=41075] checkOverlay() Overlay seems supported and allowed by kernel
DEBUG [U=740536,P=41075] setupSessionLayout() Attempting to use overlayfs (enable overlay = try)
DEBUG [U=740536,P=41075] setupOverlayLayout() Creating overlay SESSIONDIR layout
DEBUG [U=740536,P=41075] addRootfsMount() Mount rootfs in read-only mode
DEBUG [U=740536,P=41075] addRootfsMount() Image type is 4096
DEBUG [U=740536,P=41075] addRootfsMount() Mounting block [squashfs] image: /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci-tmp/6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a/busybox_latest.sif
DEBUG [U=740536,P=41075] addKernelMount() Checking configuration file for 'mount proc'
DEBUG [U=740536,P=41075] addKernelMount() Adding proc to mount list
VERBOSE [U=740536,P=41075] addKernelMount() Default mount: /proc:/proc
DEBUG [U=740536,P=41075] addKernelMount() Checking configuration file for 'mount sys'
DEBUG [U=740536,P=41075] addKernelMount() Adding sysfs to mount list
VERBOSE [U=740536,P=41075] addKernelMount() Default mount: /sys:/sys
DEBUG [U=740536,P=41075] addDevMount() Checking configuration file for 'mount dev'
DEBUG [U=740536,P=41075] addDevMount() Adding dev to mount list
VERBOSE [U=740536,P=41075] addDevMount() Default mount: /dev:/dev
DEBUG [U=740536,P=41075] addHostMount() Not mounting host file systems per configuration
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /etc/localtime, /etc/localtime
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /etc/hosts, /etc/hosts
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /N/home, /N/home
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /N/u, /N/u
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /N/dc2, /N/dc2
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /N/soft, /N/soft
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /N/dcwan, /N/dcwan
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /N/slate, /N/slate
VERBOSE [U=740536,P=41075] addBindsMount() Found 'bind path' = /N/project, /N/project
DEBUG [U=740536,P=41075] addHomeStagingDir() Staging home directory (/N/u/hayashis/BigRed3) at /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3
DEBUG [U=740536,P=41075] addHomeMount() Adding home directory mount [/geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3:/N/u/hayashis/BigRed3] to list using layer: overlay
DEBUG [U=740536,P=41075] isLayerEnabled() Using Layer system: overlay
DEBUG [U=740536,P=41075] addTmpMount() Checking for 'mount tmp' in configuration file
VERBOSE [U=740536,P=41075] addTmpMount() Default mount: /tmp:/tmp
VERBOSE [U=740536,P=41075] addTmpMount() Default mount: /var/tmp:/var/tmp
DEBUG [U=740536,P=41075] addScratchMount() Not mounting scratch directory: Not requested
DEBUG [U=740536,P=41075] addCwdMount() Using /geode2/home/u030/hayashis/BigRed3 as current working directory
VERBOSE [U=740536,P=41075] addCwdMount() Default mount: /geode2/home/u030/hayashis/BigRed3: to the container
DEBUG [U=740536,P=41075] addLibsMount() Checking for 'user bind control' in configuration file
DEBUG [U=740536,P=41075] addResolvConfMount() Adding /etc/resolv.conf to mount list
VERBOSE [U=740536,P=41075] addResolvConfMount() Default mount: /etc/resolv.conf:/etc/resolv.conf
DEBUG [U=740536,P=41075] addHostnameMount() Skipping hostname mount, not virtualizing UTS namespace on user request
DEBUG [U=740536,P=41075] create() Mount all
DEBUG [U=740536,P=41075] mountGeneric() Mounting tmpfs to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session
DEBUG [U=740536,P=41075] mountImage() Mounting loop device /dev/loop4 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/rootfs of type squashfs
DEBUG [U=740536,P=41075] mountGeneric() Mounting overlay to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
DEBUG [U=740536,P=41075] setPropagationMount() Set RPC mount propagation flag to SLAVE
VERBOSE [U=740536,P=41075] Passwd() Checking for template passwd file: /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/rootfs/etc/passwd
VERBOSE [U=740536,P=41075] Passwd() Creating passwd content
VERBOSE [U=740536,P=41075] Passwd() Creating template passwd file and appending user data: /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/rootfs/etc/passwd
DEBUG [U=740536,P=41075] addIdentityMount() Adding /etc/passwd to mount list
VERBOSE [U=740536,P=41075] addIdentityMount() Default mount: /etc/passwd:/etc/passwd
VERBOSE [U=740536,P=41075] Group() Checking for template group file: /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/rootfs/etc/group
VERBOSE [U=740536,P=41075] Group() Creating group content
DEBUG [U=740536,P=41075] addIdentityMount() Adding /etc/group to mount list
VERBOSE [U=740536,P=41075] addIdentityMount() Default mount: /etc/group:/etc/group
DEBUG [U=740536,P=41075] mountGeneric() Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
DEBUG [U=740536,P=41075] mountGeneric() Mounting /dev to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/dev
DEBUG [U=740536,P=41075] mountGeneric() Mounting /etc/localtime to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/localtime
DEBUG [U=740536,P=41075] mountGeneric() Mounting /etc/hosts to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/hosts
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/home to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/home
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/u to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/u
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/dc2 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/dc2
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/soft to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/soft
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/dcwan to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/dcwan
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/slate to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/slate
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/project to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/project
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/soft/cle6/singularity/3.4.2/etc/singularity/actions to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/.singularity.d/actions
DEBUG [U=740536,P=41075] mountGeneric() Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/.singularity.d/actions
DEBUG [U=740536,P=41075] mountGeneric() Mounting /proc to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/proc
DEBUG [U=740536,P=41075] mountGeneric() Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/proc
DEBUG [U=740536,P=41075] mountGeneric() Mounting sysfs to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/sys
DEBUG [U=740536,P=41075] mountGeneric() Mounting /N/u/hayashis/BigRed3 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3
DEBUG [U=740536,P=41075] mountGeneric() Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3
DEBUG [U=740536,P=41075] mountGeneric() Mounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/u/hayashis/BigRed3
DEBUG [U=740536,P=41075] mountGeneric() Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/u/hayashis/BigRed3
DEBUG [U=740536,P=41075] mountGeneric() Mounting /tmp to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/tmp
DEBUG [U=740536,P=41075] mountGeneric() Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/tmp
DEBUG [U=740536,P=41075] mountGeneric() Mounting /var/tmp to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/var/tmp
DEBUG [U=740536,P=41075] mountGeneric() Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/var/tmp
DEBUG [U=740536,P=41075] mountGeneric() Mounting /geode2/home/u030/hayashis/BigRed3 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/geode2/home/u030/hayashis/BigRed3
DEBUG [U=740536,P=41075] mountGeneric() Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/geode2/home/u030/hayashis/BigRed3
DEBUG [U=740536,P=41075] mountGeneric() Mounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/etc/resolv.conf to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/resolv.conf
DEBUG [U=740536,P=41075] mountGeneric() Mounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/etc/passwd to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/passwd
DEBUG [U=740536,P=41075] mountGeneric() Mounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/etc/group to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/group
DEBUG [U=740536,P=41075] create() Chroot into /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
DEBUG [U=0,P=41093] Chroot() Hold reference to host / directory
DEBUG [U=0,P=41093] Chroot() Called pivot_root on /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
DEBUG [U=0,P=41093] Chroot() Change current directory to host / directory
DEBUG [U=0,P=41093] Chroot() Apply slave mount propagation for host / directory
DEBUG [U=0,P=41093] Chroot() Called unmount(/, syscall.MNT_DETACH)
DEBUG [U=0,P=41093] Chroot() Changing directory to / to avoid getpwd issues
DEBUG [U=740536,P=41075] create() Chdir into / to avoid errors
VERBOSE [U=0,P=41092] wait_child() rpc server exited with status 0
DEBUG [U=0,P=41092] apply_container_privileges() Set user ID to 740536
DEBUG [U=740536,P=41092] set_parent_death_signal() Set parent death signal to 9
DEBUG [U=740536,P=41092] startup() singularity runtime engine selected
VERBOSE [U=740536,P=41092] startup() Execute stage 2
DEBUG [U=740536,P=41092] StageTwo() Entering stage 2
WARNING [U=740536,P=41092] checkExec() container does not have /.singularity.d/actions/exec, calling ls directly
DEBUG [U=740536,P=41075] PostStartProcess() Post start process
ls: /.singularity.d/actions: Srmount error
DEBUG [U=740536,P=41075] Master() Child exited with exit status 1
srun: error: nid00517: task 0: Exited with exit code 1
srun: Terminating job step 146976.0
Please let me know if there is any other information I can provide.
Edit - sorry, I overlooked the reply from @cclerget already above: https://github.com/sylabs/singularity/issues/4887#issuecomment-572433744
It may be that the state dir being node local isn't enough?
Original message below:
The main thing I'd advise is that Singularity should be installed so that the session directory:
/geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
... is on node local storage. It should not be on a shared filesystem like GPFS. There are a number of things that can occur if it is. For example, overlay mounts may not work correctly on top of certain filesystems. User namespace mapping issues may occur also, if support isn't present on a particular filesystem / version.
We document this in the user guide and the admin guide. The state directory can be set when configuring with mconfig
via the --localstatedir
option.
--localstatedir: Set the state directory where containers are mounted. This is a particularly important option for administrators installing Singularity on a shared file system. The --localstatedir should be set to a directory that is present on each individual node.
@cclerget may have some more specific thoughts on how this issue might be worked around, but if you can try a Singularity install where --localstatedir
is set to not be on the GPFS fs, that would be great.
@soichih Could you try to run by adding the bind :
-B $(singularity buildcfg|grep SESSIONDIR|cut -d "=" -f2)/rootfs/.singularity.d/actions:/.singularity.d/actions
It will force to use /.singularity.d/actions
from the container image
And also the output for df -T /N/soft/cle6/singularity/3.4.2/etc/singularity/actions
Manually binding SESSIONDIR seems to work around the problem.
$ srun singularity -d exec -e -B $(singularity buildcfg|grep SESSIONDIR|cut -d "=" -f2)/rootfs/.singularity.d/actions:/.singularity.d/actions docker://busybox ls /.singularity.d/actions
srun: job 147062 queued and waiting for resources
srun: job 147062 has been allocated resources
...
exec
run
shell
start
test
DEBUG [U=740536,P=7339] Master() Child exited with exit status 0
Here is the output for df -T
$ df -T /N/soft/cle6/singularity/3.4.2/etc/singularity/actions
Filesystem Type 1K-blocks Used Available Use% Mounted on
g2-soft gpfs 13170548736 9455124480 3715424256 72% /geode2/soft
@soichih Could you test fix in #4938 ? Thanks !
@cclerget I will need to ask our sysad to test this on IU bigred3. I don't think I can install singularity without root?
@soichih If you can exec singularity shell -u docker://alpine
without a user namespace error you should be able to install singularity without root with ./mconfig --without-suid
I don't get user namespace error, but..
$ singularity shell -u docker://alpine
INFO: Convert SIF file to sandbox...
Singularity> ./mconfig --without-suid
E: Not inside a git repository and no VERSION file found. Abort.
Singularity> which git
Singularity>
I am inside the git repository.. outside the shell.
hayashis@bigred3(elogin1):~/git/singularity(disabled) 1 git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
I don't think it's trival to get this going on my own.
I am using singularity version 3.4.2
When I run the following command on our campus HPC system (Bigred3 at Indiana University), I get the following output.
Please note that I see a lot of "freesurfer" related ENV parameters set when this container was built.
However, when I run the same command via srun, I get the following output instead.
Please note the warning message("container does not have /.singularity.d/actions/exec, calling /usr/bin/env directly") and none of the fresurfer related ENVs are present.
I've already checked with our sysadmin to make sure that the same version of singularity / configuration is installed on both login node and all CEs on our cluster. They suggested contacting singularity team to further troubleshoot this problem.
I also run into the same problem if I run singularity via bash script submitted by sbatch.
We are running SLES12 on our cluster.
Singularity is installed by our sysadmin on the shared file system under the following directory.
Here is the content of the singularity.conf
Please help us troubleshoot this problem!