apptainer / singularity

Singularity has been renamed to Apptainer as part of us moving the project to the Linux Foundation. This repo has been persisted as a snapshot right before the changes.
https://github.com/apptainer/apptainer
Other
2.52k stars 424 forks source link

/.singularity.d/env not loaded when singularity is executed via srun/sbatch. #4887

Closed soichih closed 4 years ago

soichih commented 4 years ago

I am using singularity version 3.4.2

When I run the following command on our campus HPC system (Bigred3 at Indiana University), I get the following output.

singularity  exec -e docker://brainlife/freesurfer_on_mcr:6.0.2 /usr/bin/env
WARNING: skipping mount of /N/project: permission denied
TERM=xterm-256color
PERL5LIB=/usr/local/freesurfer/mni/share/perl5
SINGULARITY_APPNAME=
LOCAL_DIR=/usr/local/freesurfer/local
LD_LIBRARY_PATH=/.singularity.d/libs
FSFAST_HOME=/usr/local/freesurfer/fsfast
MNI_PERL5LIB=/usr/local/freesurfer/mni/share/perl5
FMRI_ANALYSIS_DIR=/usr/local/freesurfer/fsfast
SINGULARITY_NAME=freesurfer_on_mcr_6.0.2.sif
MINC_BIN_DIR=/usr/local/freesurfer/mni/bin
SUBJECTS_DIR=/usr/local/freesurfer/subjects
PATH=/usr/local/freesurfer/bin:/usr/local/freesurfer/fsfast/bin:/usr/local/freesurfer/tktools:/usr/local/freesurfer/mni/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:\/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
PWD=/gpfs/home/b/r/brlife/BigRed3/test/sbatch
FUNCTIONALS_DIR=/usr/local/freesurfer/sessions
LANG=C
MINC_LIB_DIR=/usr/local/freesurfer/mni/lib
PS1=Singularity> 
SHLVL=0
HOME=/N/u/brlife/BigRed3
MNI_DIR=/usr/local/freesurfer/mni
FREESURFER_HOME=/usr/local/freesurfer
SINGULARITY_CONTAINER=/N/dc2/scratch/brlife/singularity-cache-br3/cache/oci-tmp/34f96d505677bb18d831dbc2baae1986de1a8905ec6fadf998c1e2871f0ed741/freesurfer_on_mcr_6.0.2.sif
MNI_DATAPATH=/usr/local/freesurfer/mni/data

Please note that I see a lot of "freesurfer" related ENV parameters set when this container was built.

However, when I run the same command via srun, I get the following output instead.

srun singularity  exec -e docker://brainlife/freesurfer_on_mcr:6.0.2 /usr/bin/env
srun: job 103268 queued and waiting for resources
srun: job 103268 has been allocated resources
time="2019-12-19T16:47:43-05:00" level=warning msg="\"/run/user/1589653\" directory set by $XDG_RUNTIME_DIR does not exist. Either create the directory or unset $XDG_RUNTIME_DIR.: stat /run/user/1589653: no such file or directory: Trying to pull image in the event that it is a public image."
WARNING: skipping mount of /N/project: permission denied
WARNING: container does not have /.singularity.d/actions/exec, calling /usr/bin/env directly
SINGULARITY_CONTAINER=/N/dc2/scratch/brlife/singularity-cache-br3/cache/oci-tmp/34f96d505677bb18d831dbc2baae1986de1a8905ec6fadf998c1e2871f0ed741/freesurfer_on_mcr_6.0.2.sif
SINGULARITY_NAME=freesurfer_on_mcr_6.0.2.sif
TERM=xterm-256color
HOME=/N/u/brlife/BigRed3
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
LANG=C
SINGULARITY_APPNAME=

Please note the warning message("container does not have /.singularity.d/actions/exec, calling /usr/bin/env directly") and none of the fresurfer related ENVs are present.

I've already checked with our sysadmin to make sure that the same version of singularity / configuration is installed on both login node and all CEs on our cluster. They suggested contacting singularity team to further troubleshoot this problem.

I also run into the same problem if I run singularity via bash script submitted by sbatch.

We are running SLES12 on our cluster.

cat /etc/os-release
NAME="SLES"
VERSION="12-SP3"
VERSION_ID="12.3"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12:sp3"

Singularity is installed by our sysadmin on the shared file system under the following directory.

/N/soft/cle6/singularity/3.4.2/bin/singularity

Here is the content of the singularity.conf

# SINGULARITY.CONF
# This is the global configuration file for Singularity. This file controls
# what the container is allowed to do on a particular host, and as a result
# this file must be owned by root.

# ALLOW SETUID: [BOOL]
# DEFAULT: yes
# Should we allow users to utilize the setuid program flow within Singularity?
# note1: This is the default mode, and to utilize all features, this option
# must be enabled.  For example, without this option loop mounts of image 
# files will not work; only sandbox image directories, which do not need loop
# mounts, will work (subject to note 2).
# note2: If this option is disabled, it will rely on unprivileged user
# namespaces which have not been integrated equally between different Linux
# distributions.
allow setuid = yes

# MAX LOOP DEVICES: [INT]
# DEFAULT: 256
# Set the maximum number of loop devices that Singularity should ever attempt
# to utilize.
max loop devices = 256

# ALLOW PID NS: [BOOL]
# DEFAULT: yes
# Should we allow users to request the PID namespace? Note that for some HPC
# resources, the PID namespace may confuse the resource manager and break how
# some MPI implementations utilize shared memory. (note, on some older
# systems, the PID namespace is always used)
allow pid ns = yes

# CONFIG PASSWD: [BOOL]
# DEFAULT: yes
# If /etc/passwd exists within the container, this will automatically append
# an entry for the calling user.
config passwd = yes

# CONFIG GROUP: [BOOL]
# DEFAULT: yes
# If /etc/group exists within the container, this will automatically append
# group entries for the calling user.
config group = yes

# CONFIG RESOLV_CONF: [BOOL]
# DEFAULT: yes
# If there is a bind point within the container, use the host's
# /etc/resolv.conf.
config resolv_conf = yes

# MOUNT PROC: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /proc within the container?
mount proc = yes

# MOUNT SYS: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /sys within the container?
mount sys = yes

# MOUNT DEV: [yes/no/minimal]
# DEFAULT: yes
# Should we automatically bind mount /dev within the container? If 'minimal'
# is chosen, then only 'null', 'zero', 'random', 'urandom', and 'shm' will
# be included (the same effect as the --contain options)
mount dev = yes

# MOUNT DEVPTS: [BOOL]
# DEFAULT: yes
# Should we mount a new instance of devpts if there is a 'minimal'
# /dev, or -C is passed?  Note, this requires that your kernel was
# configured with CONFIG_DEVPTS_MULTIPLE_INSTANCES=y, or that you're
# running kernel 4.7 or newer.
mount devpts = yes

# MOUNT HOME: [BOOL]
# DEFAULT: yes
# Should we automatically determine the calling user's home directory and
# attempt to mount it's base path into the container? If the --contain option
# is used, the home directory will be created within the session directory or
# can be overridden with the SINGULARITY_HOME or SINGULARITY_WORKDIR
# environment variables (or their corresponding command line options).
mount home = yes 

# MOUNT TMP: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /tmp and /var/tmp into the container? If
# the --contain option is used, both tmp locations will be created in the
# session directory or can be specified via the  SINGULARITY_WORKDIR
# environment variable (or the --workingdir command line option).
mount tmp = yes

# MOUNT HOSTFS: [BOOL]
# DEFAULT: no
# Probe for all mounted file systems that are mounted on the host, and bind
# those into the container?
mount hostfs = no

# BIND PATH: [STRING]
# DEFAULT: Undefined
# Define a list of files/directories that should be made available from within
# the container. The file or directory must exist within the container on
# which to attach to. you can specify a different source and destination
# path (respectively) with a colon; otherwise source and dest are the same.
# NOTE: these are ignored if singularity is invoked with --contain.
#bind path = /etc/singularity/default-nsswitch.conf:/etc/nsswitch.conf
#bind path = /opt
#bind path = /scratch
bind path = /etc/localtime
bind path = /etc/hosts
bind path = /N/home
bind path = /N/u
bind path = /N/dc2
bind path = /N/soft
bind path = /N/dcwan
bind path = /N/slate
bind path = /N/project

# USER BIND CONTROL: [BOOL]
# DEFAULT: yes
# Allow users to influence and/or define bind points at runtime? This will allow
# users to specify bind points, scratch and tmp locations. (note: User bind
# control is only allowed if the host also supports PR_SET_NO_NEW_PRIVS)
user bind control = yes

# ENABLE FUSEMOUNT: [BOOL]
# DEFAULT: yes
# Allow users to mount fuse filesystems inside containers with the --fusemount
# command line option.
enable fusemount = yes

# ENABLE OVERLAY: [yes/no/try]
# DEFAULT: try
# Enabling this option will make it possible to specify bind paths to locations
# that do not currently exist within the container.  If 'try' is chosen,
# overlayfs will be tried but if it is unavailable it will be silently ignored.
enable overlay = try

# ENABLE UNDERLAY: [yes/no]
# DEFAULT: yes
# Enabling this option will make it possible to specify bind paths to locations
# that do not currently exist within the container even if overlay is not
# working.  If overlay is available, it will be tried first.
enable underlay = yes

# MOUNT SLAVE: [BOOL]
# DEFAULT: yes
# Should we automatically propagate file-system changes from the host?
# This should be set to 'yes' when autofs mounts in the system should
# show up in the container.
mount slave = yes

# SESSIONDIR MAXSIZE: [STRING]
# DEFAULT: 16
# This specifies how large the default sessiondir should be (in MB) and it will
# only affect users who use the "--contain" options and don't also specify a
# location to do default read/writes to (e.g. "--workdir" or "--home").
sessiondir max size = 16

# LIMIT CONTAINER OWNERS: [STRING]
# DEFAULT: NULL
# Only allow containers to be used that are owned by a given user. If this
# configuration is undefined (commented or set to NULL), all containers are
# allowed to be used. This feature only applies when Singularity is running in
# SUID mode and the user is non-root.
#limit container owners = gmk, singularity, nobody

# LIMIT CONTAINER GROUPS: [STRING]
# DEFAULT: NULL
# Only allow containers to be used that are owned by a given group. If this
# configuration is undefined (commented or set to NULL), all containers are
# allowed to be used. This feature only applies when Singularity is running in
# SUID mode and the user is non-root.
#limit container groups = group1, singularity, nobody

# LIMIT CONTAINER PATHS: [STRING]
# DEFAULT: NULL
# Only allow containers to be used that are located within an allowed path
# prefix. If this configuration is undefined (commented or set to NULL),
# containers will be allowed to run from anywhere on the file system. This
# feature only applies when Singularity is running in SUID mode and the user is
# non-root.
#limit container paths = /scratch, /tmp, /global

# ALLOW CONTAINER ${TYPE}: [BOOL]
# DEFAULT: yes
# This feature limits what kind of containers that Singularity will allow
# users to use (note this does not apply for root).
allow container squashfs = yes
allow container extfs = yes
allow container dir = yes

# AUTOFS BUG PATH: [STRING]
# DEFAULT: Undefined
# Define list of autofs directories which produces "Too many levels of symbolink links"
# errors when accessed from container (typically bind mounts)
#autofs bug path = /nfs
#autofs bug path = /cifs-share

# ALWAYS USE NV ${TYPE}: [BOOL]
# DEFAULT: no
# This feature allows an administrator to determine that every action command
# should be executed implicitly with the --nv option (useful for GPU only 
# environments). 
always use nv = no

# ROOT DEFAULT CAPABILITIES: [full/file/no]
# DEFAULT: full
# Define default root capability set kept during runtime
# - full: keep all capabilities (same as --keep-privs)
# - file: keep capabilities configured in ${prefix}/etc/singularity/capabilities/user.root
# - no: no capabilities (same as --no-privs)
root default capabilities = no

# MEMORY FS TYPE: [tmpfs/ramfs]
# DEFAULT: tmpfs
# This feature allow to choose temporary filesystem type used by Singularity.
# Cray CLE 5 and 6 up to CLE 6.0.UP05 there is an issue (kernel panic) when Singularity
# use tmpfs, so on affected version it's recommended to set this value to ramfs to avoid
# kernel panic
memory fs type = tmpfs

# CNI CONFIGURATION PATH: [STRING]
# DEFAULT: Undefined
# Defines path from where CNI configuration files are stored
#cni configuration path =

# CNI PLUGIN PATH: [STRING]
# DEFAULT: Undefined
# Defines path from where CNI executable plugins are stored
#cni plugin path =

# MKSQUASHFS PATH: [STRING]
# DEFAULT: Undefined
# This allows the administrator to specify the location for mksquashfs if it is not
# installed in a standard system location
# mksquashfs path =

# CRYPTSETUP PATH: [STRING]
# DEFAULT: Undefined
# This allows the administrator to specify the location of cryptsetup if
# they wish to use custom location for this installation. If this value
# is undefined, at runtime singularity falls back to the value that was
# recorded at build time.
# cryptsetup path =

# SHARED LOOP DEVICES: [BOOL]
# DEFAULT: no
# Allow to share same images associated with loop devices to minimize loop
# usage and optimize kernel cache (useful for MPI)
shared loop devices = no

Please help us troubleshoot this problem!

soichih commented 4 years ago

Hello. We are still stuck with this problem. Is singularity v3 supported on CRAY system? Is there any other information that I can provide to help troubleshoot this problem?

soichih commented 4 years ago

I did notice one thing..

The env that goes missing are stored in /.singularity.d/env/10-docker2singularity.sh, and this is loaded by /.singularity.d/actions/exec or /.singularity.d/actions/run, etc..

#!/bin/sh

for script in /.singularity.d/env/*.sh; do
    if [ -f "$script" ]; then
        . "$script"
    fi
done

When I run singularity via srun, I see the following warning.

hayashis@bigred3(elogin2):~(disabled) $ srun singularity exec -e docker://brainlife/freesurfer_on_mcr:6.0.2 ls /.singularity.d/actions
..
WARNING: container does not have /.singularity.d/actions/exec, calling ls directly
/bin/ls: cannot access '/.singularity.d/actions': Srmount error
..

So, it looks like /.singularity.d/actions directory somehow goes missing when I run singularity via srun, and this causes it to not load our /.singularity.d/env scripts.

My question is, why does /.singularity.d/actions go missing if I run it on our cluster CE via srun?

jmstover commented 4 years ago

The /.singularity.d/actions directory is bind mounted in from ${sysconfdir}/singularity/actions. Does that directory exist on all run nodes?

That's showing a mount error when you're trying to do a ls... what does the SYSCONFDIR location look like on the host?

eval $(singularity buildcfg | grep ^SYSCONFDIR)
df -h $SYSCONFDIR

Others may have other ideas as well ...

jmstover commented 4 years ago

@cclerget @dctrud So ... is there any way you can think of using just --prefix that you can build singularity, but not have the ${sysconfdir}/singularity/actions on every node if installed to a shared location. I mean, if you explicitly set --sysconfdir to a different value, but then in this case we'd not be running because the singularity.conf may not exist.

dtrudg commented 4 years ago

@soichih - If this is a CLE6 environment, then a patch on the release-3.5 branch (not yet in a versioned release) would allow you to try the 3.5 series - https://github.com/sylabs/singularity/pull/4880 instead of 3.4.2 which is not supported now.

We really need to see the output of srun singularity -d exec .... here. The -d will give us debug output that will hopefully show more what is going wrong. As suggested above, output of singularity buildcfg and a listing of the SYSCONFDIR would be useful too.

@jmstover - I suspect something is not being bind mounted properly, rather than missing outright. If this is the same system as the other CLE 6 bug reports we have GPFS filesystems, and the Cray overlay stuff in play.

cclerget commented 4 years ago

@dctrud @jmstover The actions directory bind looks OK otherwise singularity would fail due to the missing bind source, between this behavior is reproducible by removing execute permission from exec :

chmod 444 $(singularity buildcfg|grep SYSCONFDIR|cut -d "=" -f2)/singularity/actions/exec
singularity exec docker://alpine id

EDIT: my bad, actions bind is ignored if it doesn't exist, but it appears the bind mount is OK because most images have a /.singularity.d/actions with default scripts, and it looks like from srmount error that it was mounted but further access to files in this directory end with srmount error :

after private discussion with a user on slack, Srmount error seems to appear when singularity is installed on a DVS (Cray Data Virtualization Service) mount point, DVS looks pretty similar to 9p and redirect posix VFS calls in linux kernel to a server (or multiple servers) serving the underlying filesystem.

soichih commented 4 years ago

@jmstover Here is what I am seeing.

$ singularity buildcfg | grep ^SYSCONFDIR
SYSCONFDIR=/N/soft/cle6/singularity/3.4.2//etc
$ df -h /N/soft/cle6/singularity/3.4.2//etc
Filesystem      Size  Used Avail Use% Mounted on
g2-soft          13T  8.9T  3.5T  72% /geode2/soft

The actions directory does exist on run nodes.

$ srun ls /N/soft/cle6/singularity/3.4.2/etc/singularity/actions
srun: job 146972 queued and waiting for resources
srun: job 146972 has been allocated resources
exec
run
shell
start
test

.. but it's not getting properly mounted for some reason.

@dctrud

I believe we are running on CLE6 (guessing it from "/N/soft/cle6").. and /N/soft is on GPFS.

hayashis@bigred3(elogin1):/N(disabled) $ ls -lrt
..
lrwxrwxrwx   1 root    root       17 Jan  6 09:04 soft -> /geode2/soft/hps/
...
$ df -T
Filesystem                               Type         1K-blocks          Used     Available Use% Mounted on
...
g2-soft                                  gpfs       13170548736    9455104000    3715444736  72% /geode2/soft

Here is the output with -d exec option.

$ singularity exec -e docker://busybox ls /.singularity.d/actions
exec   run    shell  start  test
$ srun singularity -d exec -e docker://busybox ls /.singularity.d/actions
srun: job 146976 queued and waiting for resources
srun: job 146976 has been allocated resources
DEBUG   [U=740536,P=41075] createConfDir()               /N/u/hayashis/BigRed3/.singularity already exists. Not creating.
DEBUG   [U=740536,P=41075] updateCacheSubdir()           Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/library
DEBUG   [U=740536,P=41075] updateCacheSubdir()           Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci-tmp
DEBUG   [U=740536,P=41075] updateCacheSubdir()           Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci
DEBUG   [U=740536,P=41075] updateCacheSubdir()           Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/net
DEBUG   [U=740536,P=41075] updateCacheSubdir()           Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/shub
DEBUG   [U=740536,P=41075] updateCacheSubdir()           Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oras
DEBUG   [U=740536,P=41075] parseURI()                    Parsing docker://busybox into reference
DEBUG   [U=740536,P=41075] updateCacheSubdir()           Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci-tmp/6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
DEBUG   [U=740536,P=41075] updateCacheSubdir()           Caching directory set to /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci-tmp/6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
DEBUG   [U=740536,P=41075] execStarter()                 Use starter binary /N/soft/cle6/singularity/3.4.2/libexec/singularity/bin/starter-suid
DEBUG   [U=740536,P=41075] execStarter()                 Checking for encrypted system partition
DEBUG   [U=740536,P=41075] Init()                        Image format detection
DEBUG   [U=740536,P=41075] Init()                        Check for sandbox image format
DEBUG   [U=740536,P=41075] Init()                        sandbox format initializer returned: not a directory image
DEBUG   [U=740536,P=41075] Init()                        Check for sif image format
DEBUG   [U=740536,P=41075] Init()                        sif image format detected
VERBOSE [U=740536,P=41075] SetContainerEnv()             Not forwarding SINGULARITY_CACHEDIR from user to container environment
VERBOSE [U=740536,P=41075] SetContainerEnv()             HOME = /N/u/hayashis/BigRed3
VERBOSE [U=0,P=41075]      print()                       Set messagelevel to: 5
VERBOSE [U=0,P=41075]      init()                        Starter initialization
DEBUG   [U=0,P=41075]      get_pipe_exec_fd()            PIPE_EXEC_FD value: 8
VERBOSE [U=0,P=41075]      is_suid()                     Check if we are running as setuid
VERBOSE [U=0,P=41075]      priv_drop()                   Drop root privileges
DEBUG   [U=740536,P=41075]  init()                        Read engine configuration
DEBUG   [U=740536,P=41075]  init()                        Wait completion of stage1
VERBOSE [U=740536,P=41086]  priv_drop()                   Drop root privileges permanently
DEBUG   [U=740536,P=41086]  set_parent_death_signal()     Set parent death signal to 9
VERBOSE [U=740536,P=41086]  init()                        Spawn stage 1
DEBUG   [U=740536,P=41086] startup()                     singularity runtime engine selected
VERBOSE [U=740536,P=41086] startup()                     Execute stage 1
DEBUG   [U=740536,P=41086] StageOne()                    Entering stage 1
DEBUG   [U=740536,P=41086] prepareFd()                   Open file descriptor for /N/home
DEBUG   [U=740536,P=41086] prepareFd()                   Open file descriptor for /N/u
DEBUG   [U=740536,P=41086] prepareFd()                   Open file descriptor for /N/dc2
DEBUG   [U=740536,P=41086] prepareFd()                   Open file descriptor for /N/soft
DEBUG   [U=740536,P=41086] prepareFd()                   Open file descriptor for /N/dcwan
DEBUG   [U=740536,P=41086] prepareFd()                   Open file descriptor for /N/slate
DEBUG   [U=740536,P=41086] prepareFd()                   Open file descriptor for /N/project
DEBUG   [U=740536,P=41086] Init()                        Image format detection
DEBUG   [U=740536,P=41086] Init()                        Check for sandbox image format
DEBUG   [U=740536,P=41086] Init()                        sandbox format initializer returned: not a directory image
DEBUG   [U=740536,P=41086] Init()                        Check for sif image format
DEBUG   [U=740536,P=41086] Init()                        sif image format detected
VERBOSE [U=740536,P=41075]  wait_child()                  stage 1 exited with status 0
DEBUG   [U=740536,P=41075]  cleanup_fd()                  Close file descriptor 4
DEBUG   [U=740536,P=41075]  cleanup_fd()                  Close file descriptor 6
DEBUG   [U=740536,P=41075]  cleanup_fd()                  Close file descriptor 7
DEBUG   [U=740536,P=41075]  init()                        Set child signal mask
DEBUG   [U=740536,P=41075]  init()                        Create socketpair for master communication channel
DEBUG   [U=740536,P=41075]  init()                        Create RPC socketpair for communication between stage 2 and RPC server
VERBOSE [U=740536,P=41075]  priv_escalate()               Get root privileges
VERBOSE [U=0,P=41075]      priv_escalate()               Change filesystem uid to 740536
VERBOSE [U=0,P=41075]      init()                        Spawn master process
DEBUG   [U=0,P=41092]      set_parent_death_signal()     Set parent death signal to 9
VERBOSE [U=0,P=41092]      create_namespace()            Create mount namespace
VERBOSE [U=0,P=41075]      enter_namespace()             Entering in mount namespace
DEBUG   [U=0,P=41075]      enter_namespace()             Opening namespace file ns/mnt
VERBOSE [U=0,P=41092]      create_namespace()            Create mount namespace
DEBUG   [U=0,P=41093]      set_parent_death_signal()     Set parent death signal to 9
VERBOSE [U=0,P=41093]      init()                        Spawn RPC server
DEBUG   [U=740536,P=41075] startup()                     singularity runtime engine selected
DEBUG   [U=0,P=41093]      startup()                     singularity runtime engine selected
VERBOSE [U=740536,P=41075] startup()                     Execute master process
VERBOSE [U=0,P=41093]      startup()                     Serve RPC requests
DEBUG   [U=740536,P=41075] checkOverlay()                Overlay seems supported and allowed by kernel
DEBUG   [U=740536,P=41075] setupSessionLayout()          Attempting to use overlayfs (enable overlay = try)
DEBUG   [U=740536,P=41075] setupOverlayLayout()          Creating overlay SESSIONDIR layout
DEBUG   [U=740536,P=41075] addRootfsMount()              Mount rootfs in read-only mode
DEBUG   [U=740536,P=41075] addRootfsMount()              Image type is 4096
DEBUG   [U=740536,P=41075] addRootfsMount()              Mounting block [squashfs] image: /N/dc2/scratch/hayashis/singularity-cache-br3/cache/oci-tmp/6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a/busybox_latest.sif
DEBUG   [U=740536,P=41075] addKernelMount()              Checking configuration file for 'mount proc'
DEBUG   [U=740536,P=41075] addKernelMount()              Adding proc to mount list
VERBOSE [U=740536,P=41075] addKernelMount()              Default mount: /proc:/proc
DEBUG   [U=740536,P=41075] addKernelMount()              Checking configuration file for 'mount sys'
DEBUG   [U=740536,P=41075] addKernelMount()              Adding sysfs to mount list
VERBOSE [U=740536,P=41075] addKernelMount()              Default mount: /sys:/sys
DEBUG   [U=740536,P=41075] addDevMount()                 Checking configuration file for 'mount dev'
DEBUG   [U=740536,P=41075] addDevMount()                 Adding dev to mount list
VERBOSE [U=740536,P=41075] addDevMount()                 Default mount: /dev:/dev
DEBUG   [U=740536,P=41075] addHostMount()                Not mounting host file systems per configuration
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /etc/localtime, /etc/localtime
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /etc/hosts, /etc/hosts
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /N/home, /N/home
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /N/u, /N/u
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /N/dc2, /N/dc2
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /N/soft, /N/soft
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /N/dcwan, /N/dcwan
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /N/slate, /N/slate
VERBOSE [U=740536,P=41075] addBindsMount()               Found 'bind path' = /N/project, /N/project
DEBUG   [U=740536,P=41075] addHomeStagingDir()           Staging home directory (/N/u/hayashis/BigRed3) at /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3
DEBUG   [U=740536,P=41075] addHomeMount()                Adding home directory mount [/geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3:/N/u/hayashis/BigRed3] to list using layer: overlay
DEBUG   [U=740536,P=41075] isLayerEnabled()              Using Layer system: overlay
DEBUG   [U=740536,P=41075] addTmpMount()                 Checking for 'mount tmp' in configuration file
VERBOSE [U=740536,P=41075] addTmpMount()                 Default mount: /tmp:/tmp
VERBOSE [U=740536,P=41075] addTmpMount()                 Default mount: /var/tmp:/var/tmp
DEBUG   [U=740536,P=41075] addScratchMount()             Not mounting scratch directory: Not requested
DEBUG   [U=740536,P=41075] addCwdMount()                 Using /geode2/home/u030/hayashis/BigRed3 as current working directory
VERBOSE [U=740536,P=41075] addCwdMount()                 Default mount: /geode2/home/u030/hayashis/BigRed3: to the container
DEBUG   [U=740536,P=41075] addLibsMount()                Checking for 'user bind control' in configuration file
DEBUG   [U=740536,P=41075] addResolvConfMount()          Adding /etc/resolv.conf to mount list
VERBOSE [U=740536,P=41075] addResolvConfMount()          Default mount: /etc/resolv.conf:/etc/resolv.conf
DEBUG   [U=740536,P=41075] addHostnameMount()            Skipping hostname mount, not virtualizing UTS namespace on user request
DEBUG   [U=740536,P=41075] create()                      Mount all
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting tmpfs to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session
DEBUG   [U=740536,P=41075] mountImage()                  Mounting loop device /dev/loop4 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/rootfs of type squashfs
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting overlay to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
DEBUG   [U=740536,P=41075] setPropagationMount()         Set RPC mount propagation flag to SLAVE
VERBOSE [U=740536,P=41075] Passwd()                      Checking for template passwd file: /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/rootfs/etc/passwd
VERBOSE [U=740536,P=41075] Passwd()                      Creating passwd content
VERBOSE [U=740536,P=41075] Passwd()                      Creating template passwd file and appending user data: /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/rootfs/etc/passwd
DEBUG   [U=740536,P=41075] addIdentityMount()            Adding /etc/passwd to mount list
VERBOSE [U=740536,P=41075] addIdentityMount()            Default mount: /etc/passwd:/etc/passwd
VERBOSE [U=740536,P=41075] Group()                       Checking for template group file: /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/rootfs/etc/group
VERBOSE [U=740536,P=41075] Group()                       Creating group content
DEBUG   [U=740536,P=41075] addIdentityMount()            Adding /etc/group to mount list
VERBOSE [U=740536,P=41075] addIdentityMount()            Default mount: /etc/group:/etc/group
DEBUG   [U=740536,P=41075] mountGeneric()                Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /dev to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/dev
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /etc/localtime to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/localtime
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /etc/hosts to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/hosts
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/home to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/home
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/u to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/u
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/dc2 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/dc2
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/soft to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/soft
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/dcwan to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/dcwan
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/slate to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/slate
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/project to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/project
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/soft/cle6/singularity/3.4.2/etc/singularity/actions to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/.singularity.d/actions
DEBUG   [U=740536,P=41075] mountGeneric()                Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/.singularity.d/actions
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /proc to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/proc
DEBUG   [U=740536,P=41075] mountGeneric()                Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/proc
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting sysfs to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/sys
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /N/u/hayashis/BigRed3 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3
DEBUG   [U=740536,P=41075] mountGeneric()                Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/N/u/hayashis/BigRed3 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/u/hayashis/BigRed3
DEBUG   [U=740536,P=41075] mountGeneric()                Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/N/u/hayashis/BigRed3
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /tmp to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/tmp
DEBUG   [U=740536,P=41075] mountGeneric()                Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/tmp
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /var/tmp to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/var/tmp
DEBUG   [U=740536,P=41075] mountGeneric()                Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/var/tmp
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /geode2/home/u030/hayashis/BigRed3 to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/geode2/home/u030/hayashis/BigRed3
DEBUG   [U=740536,P=41075] mountGeneric()                Remounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/geode2/home/u030/hayashis/BigRed3
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/etc/resolv.conf to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/resolv.conf
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/etc/passwd to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/passwd
DEBUG   [U=740536,P=41075] mountGeneric()                Mounting /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/etc/group to /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final/etc/group
DEBUG   [U=740536,P=41075] create()                      Chroot into /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
DEBUG   [U=0,P=41093]      Chroot()                      Hold reference to host / directory
DEBUG   [U=0,P=41093]      Chroot()                      Called pivot_root on /geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final
DEBUG   [U=0,P=41093]      Chroot()                      Change current directory to host / directory
DEBUG   [U=0,P=41093]      Chroot()                      Apply slave mount propagation for host / directory
DEBUG   [U=0,P=41093]      Chroot()                      Called unmount(/, syscall.MNT_DETACH)
DEBUG   [U=0,P=41093]      Chroot()                      Changing directory to / to avoid getpwd issues
DEBUG   [U=740536,P=41075] create()                      Chdir into / to avoid errors
VERBOSE [U=0,P=41092]      wait_child()                  rpc server exited with status 0
DEBUG   [U=0,P=41092]      apply_container_privileges()  Set user ID to 740536
DEBUG   [U=740536,P=41092]  set_parent_death_signal()     Set parent death signal to 9
DEBUG   [U=740536,P=41092] startup()                     singularity runtime engine selected
VERBOSE [U=740536,P=41092] startup()                     Execute stage 2
DEBUG   [U=740536,P=41092] StageTwo()                    Entering stage 2
WARNING [U=740536,P=41092] checkExec()                   container does not have /.singularity.d/actions/exec, calling ls directly
DEBUG   [U=740536,P=41075] PostStartProcess()            Post start process
ls: /.singularity.d/actions: Srmount error
DEBUG   [U=740536,P=41075] Master()                      Child exited with exit status 1
srun: error: nid00517: task 0: Exited with exit code 1
srun: Terminating job step 146976.0

Please let me know if there is any other information I can provide.

dtrudg commented 4 years ago

Edit - sorry, I overlooked the reply from @cclerget already above: https://github.com/sylabs/singularity/issues/4887#issuecomment-572433744

It may be that the state dir being node local isn't enough?


Original message below:

The main thing I'd advise is that Singularity should be installed so that the session directory:

/geode2/soft/hps/cle6/singularity/3.4.2/var/singularity/mnt/session/final

... is on node local storage. It should not be on a shared filesystem like GPFS. There are a number of things that can occur if it is. For example, overlay mounts may not work correctly on top of certain filesystems. User namespace mapping issues may occur also, if support isn't present on a particular filesystem / version.

We document this in the user guide and the admin guide. The state directory can be set when configuring with mconfig via the --localstatedir option.

--localstatedir: Set the state directory where containers are mounted. This is a particularly important option for administrators installing Singularity on a shared file system. The --localstatedir should be set to a directory that is present on each individual node.

@cclerget may have some more specific thoughts on how this issue might be worked around, but if you can try a Singularity install where --localstatedir is set to not be on the GPFS fs, that would be great.

cclerget commented 4 years ago

@soichih Could you try to run by adding the bind :

-B $(singularity buildcfg|grep SESSIONDIR|cut -d "=" -f2)/rootfs/.singularity.d/actions:/.singularity.d/actions

It will force to use /.singularity.d/actions from the container image

And also the output for df -T /N/soft/cle6/singularity/3.4.2/etc/singularity/actions

soichih commented 4 years ago

Manually binding SESSIONDIR seems to work around the problem.

$ srun singularity -d exec -e -B $(singularity buildcfg|grep SESSIONDIR|cut -d "=" -f2)/rootfs/.singularity.d/actions:/.singularity.d/actions docker://busybox ls /.singularity.d/actions
srun: job 147062 queued and waiting for resources
srun: job 147062 has been allocated resources
...
exec
run
shell
start
test
DEBUG   [U=740536,P=7339]  Master()                      Child exited with exit status 0

Here is the output for df -T

 $ df -T /N/soft/cle6/singularity/3.4.2/etc/singularity/actions
Filesystem     Type   1K-blocks       Used  Available Use% Mounted on
g2-soft        gpfs 13170548736 9455124480 3715424256  72% /geode2/soft
cclerget commented 4 years ago

@soichih Could you test fix in #4938 ? Thanks !

soichih commented 4 years ago

@cclerget I will need to ask our sysad to test this on IU bigred3. I don't think I can install singularity without root?

cclerget commented 4 years ago

@soichih If you can exec singularity shell -u docker://alpine without a user namespace error you should be able to install singularity without root with ./mconfig --without-suid

soichih commented 4 years ago

I don't get user namespace error, but..

$ singularity shell -u docker://alpine
INFO:    Convert SIF file to sandbox...
Singularity> ./mconfig --without-suid
E: Not inside a git repository and no VERSION file found. Abort.
Singularity> which git
Singularity> 

I am inside the git repository.. outside the shell.

hayashis@bigred3(elogin1):~/git/singularity(disabled) 1 git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

I don't think it's trival to get this going on my own.