apptainer / singularity

Singularity has been renamed to Apptainer as part of us moving the project to the Linux Foundation. This repo has been persisted as a snapshot right before the changes.
https://github.com/apptainer/apptainer
Other
2.52k stars 424 forks source link

Set timeout for bind paths #5560

Closed mbhall88 closed 3 years ago

mbhall88 commented 4 years ago

Version of Singularity:

What version of Singularity are you using? Run:

$ singularity version
singularity version 3.5.0

I use singularity frequently on an LSF cluster. Occasionally I will submit a job that runs a singularity container and come back a day later to find it still running. On further inspection singularity is stuck trying to bind a path. For example

DEBUG   [U=7196,P=144171]  prepareAutofs()               Could not keep file descriptor for bind path /scratch: no mount point

Is there a way of specifying some kind of timeout to wait for a bind? It would be nice to say "if you can't bind a path within N seconds, fail/continue".

dtrudg commented 4 years ago

Could you provide the full debug output, not just this message? It's not actually hanging at this message, it's something informative that will just appear and be passed over. We need to get a better look at what is being setup to identify the exact location after this message that it is stuck.

To assist in identifying the issue / a resolution we need to know:

Thanks.

mbhall88 commented 4 years ago

Singularity command: singularity --debug exec docker://mbhall88/rasusa rasusa -h

Full debug output:

DEBUG   [U=7196,P=419569]  persistentPreRunE()           Singularity version: 3.5.0
DEBUG   [U=7196,P=419569]  handleConfDir()               /homes/mbhall88/.singularity already exists. Not creating.
DEBUG   [U=7196,P=419569]  updateCacheSubdir()           Caching directory set to /nfs/research1/zi/mbhall/Software/Singularity_images/cache/library
DEBUG   [U=7196,P=419569]  updateCacheSubdir()           Caching directory set to /nfs/research1/zi/mbhall/Software/Singularity_images/cache/oci-tmp
DEBUG   [U=7196,P=419569]  updateCacheSubdir()           Caching directory set to /nfs/research1/zi/mbhall/Software/Singularity_images/cache/oci
DEBUG   [U=7196,P=419569]  updateCacheSubdir()           Caching directory set to /nfs/research1/zi/mbhall/Software/Singularity_images/cache/net
DEBUG   [U=7196,P=419569]  updateCacheSubdir()           Caching directory set to /nfs/research1/zi/mbhall/Software/Singularity_images/cache/shub
DEBUG   [U=7196,P=419569]  updateCacheSubdir()           Caching directory set to /nfs/research1/zi/mbhall/Software/Singularity_images/cache/oras
DEBUG   [U=7196,P=419569]  parseURI()                    Parsing docker://mbhall88/rasusa into reference
DEBUG   [U=7196,P=419569]  updateCacheSubdir()           Caching directory set to /nfs/research1/zi/mbhall/Software/Singularity_images/cache/oci-tmp/03eb47b86fc2e2eb1ac577721985fd89cccb60f4ea145135bfe0ee706452e506
DEBUG   [U=7196,P=419569]  updateCacheSubdir()           Caching directory set to /nfs/research1/zi/mbhall/Software/Singularity_images/cache/oci-tmp/03eb47b86fc2e2eb1ac577721985fd89cccb60f4ea145135bfe0ee706452e506
DEBUG   [U=7196,P=419569]  execStarter()                 Checking for encrypted system partition
DEBUG   [U=7196,P=419569]  Init()                        Image format detection
DEBUG   [U=7196,P=419569]  Init()                        Check for sandbox image format
DEBUG   [U=7196,P=419569]  Init()                        sandbox format initializer returned: not a directory image
DEBUG   [U=7196,P=419569]  Init()                        Check for sif image format
DEBUG   [U=7196,P=419569]  Init()                        sif image format detected
VERBOSE [U=7196,P=419569]  SetContainerEnv()             Not forwarding SINGULARITY_CACHEDIR from user to container environment
VERBOSE [U=7196,P=419569]  SetContainerEnv()             Not forwarding SINGULARITY_LOCALCACHEDIR from user to container environment
VERBOSE [U=7196,P=419569]  SetContainerEnv()             HOME=/homes/mbhall88
DEBUG   [U=7196,P=419569]  init()                        Use starter binary /nfs/software/singularity/3.5.0/libexec/singularity/bin/starter-suid
VERBOSE [U=0,P=419569]     print()                       Set messagelevel to: 5
VERBOSE [U=0,P=419569]     init()                        Starter initialization
DEBUG   [U=0,P=419569]     load_overlay_module()         Trying to load overlay kernel module
DEBUG   [U=0,P=419569]     load_overlay_module()         Overlay seems supported by the kernel
DEBUG   [U=0,P=419569]     get_pipe_exec_fd()            PIPE_EXEC_FD value: 9
VERBOSE [U=0,P=419569]     is_suid()                     Check if we are running as setuid
VERBOSE [U=0,P=419569]     priv_drop()                   Drop root privileges
DEBUG   [U=7196,P=419569]  init()                        Read engine configuration
DEBUG   [U=7196,P=419569]  init()                        Wait completion of stage1
VERBOSE [U=7196,P=419593]  priv_drop()                   Drop root privileges permanently
DEBUG   [U=7196,P=419593]  set_parent_death_signal()     Set parent death signal to 9
VERBOSE [U=7196,P=419593]  init()                        Spawn stage 1
DEBUG   [U=7196,P=419593]  startup()                     singularity runtime engine selected
VERBOSE [U=7196,P=419593]  startup()                     Execute stage 1
DEBUG   [U=7196,P=419593]  StageOne()                    Entering stage 1
DEBUG   [U=7196,P=419593]  prepareAutofs()               Found "/proc/sys/fs/binfmt_misc" as autofs mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /scratch: no mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /etc/localtime: no mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /etc/hosts: no mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /gpfs/nobackup: no mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /hps/nobackup: no mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /hps/nobackup2: no mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /nfs/dbtools: no mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /nfs/ensembl: no mount point
DEBUG   [U=7196,P=419593]  prepareAutofs()               Could not keep file descriptor for bind path /nfs/ensemblftp: no mount point

Mounts / autofs in singularity.conf:

# MOUNT PROC: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /proc within the container?
mount proc = yes

# MOUNT SYS: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /sys within the container?
mount sys = yes

# MOUNT DEV: [yes/no/minimal]
# DEFAULT: yes
# Should we automatically bind mount /dev within the container? If 'minimal'
# is chosen, then only 'null', 'zero', 'random', 'urandom', and 'shm' will
# be included (the same effect as the --contain options)
mount dev = yes

# MOUNT DEVPTS: [BOOL]
# DEFAULT: yes
# Should we mount a new instance of devpts if there is a 'minimal'
# /dev, or -C is passed?  Note, this requires that your kernel was
# configured with CONFIG_DEVPTS_MULTIPLE_INSTANCES=y, or that you're
# running kernel 4.7 or newer.
mount devpts = yes

# MOUNT HOME: [BOOL]
# DEFAULT: yes
# Should we automatically determine the calling user's home directory and
# attempt to mount it's base path into the container? If the --contain option
# is used, the home directory will be created within the session directory or
# can be overridden with the SINGULARITY_HOME or SINGULARITY_WORKDIR
# environment variables (or their corresponding command line options).
mount home = yes

# MOUNT TMP: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /tmp and /var/tmp into the container? If
# the --contain option is used, both tmp locations will be created in the
# session directory or can be specified via the  SINGULARITY_WORKDIR
# environment variable (or the --workingdir command line option).
mount tmp = yes

# MOUNT HOSTFS: [BOOL]
# DEFAULT: no
# Probe for all mounted file systems that are mounted on the host, and bind
# those into the container?
mount hostfs = no

# BIND PATH: [STRING]
# DEFAULT: Undefined
# Define a list of files/directories that should be made available from within
# the container. The file or directory must exist within the container on
# which to attach to. you can specify a different source and destination
# path (respectively) with a colon; otherwise source and dest are the same.
# NOTE: these are ignored if singularity is invoked with --contain.
#bind path = /etc/singularity/default-nsswitch.conf:/etc/nsswitch.conf
#bind path = /opt
#bind path = /scratch
#bind path = /etc/localtime
#bind path = /etc/hosts
bind path = /scratch
bind path = /etc/localtime
bind path = /etc/hosts
bind path = /gpfs/nobackup
bind path = /hps/nobackup
bind path = /hps/nobackup2
bind path = /nfs/dbtools
bind path = /nfs/ensembl
bind path = /nfs/ensemblftp
bind path = /nfs/ensemblgenomes/ftp
bind path = /nfs/extsrv-dynamic/vol1
bind path = /nfs/extsrv-dynamic/vol2
bind path = /nfs/extsrv-dynamic/vol3
bind path = /ebi/fasp
bind path = /nfs/ensemblarch
bind path = /nfs/ftp/ensemblorg
bind path = /homes
bind path = /nfs/hxgeneral
bind path = /nfs/javadb/uniprot01
bind path = /ebi/jitterbug
bind path = /nfs/gns/literature
bind path = /nfs/misc/misc01
bind path = /nfs/gns/misc/misc03
bind path = /nfs/gns/misc/misc04
bind path = /nfs/misc/chembl
bind path = /nfs/misc/encode
bind path = /nfs/misc/hgnc
bind path = /nfs/misc/misc02
bind path = /nfs/msd
bind path = /nfs/nobackup
bind path = /nfs/oracle/client
bind path = /nfs/pdbe_staging
bind path = /nfs/production
bind path = /nfs/production3
bind path = /nfs/research1
bind path = /nfs/software
bind path = /nfs/systems
bind path = /nfs/ftp/pub
bind path = /nfs/web-hx
bind path = /nfs/www-prod/web_hx2
bind path = /ebi/www-static
bind path = /ebi/lsf/ebi
bind path = /ebi/lsf/ebi-spool
bind path = /hps/research1/
bind path = /nfs/public/uniprot
bind path = /nfs/public/ro
bind path = /nfs/public/rw
bind path = /nfs/bioimage
bind path = /nfs/biostudies
bind path = /ebi/ftp/private

# MOUNT SLAVE: [BOOL]
# DEFAULT: yes
# Should we automatically propagate file-system changes from the host?
# This should be set to 'yes' when autofs mounts in the system should
# show up in the container.
mount slave = yes

What details do you want about the filesystems involved?

dtrudg commented 4 years ago

First - does this reproduce in the latest version of Singularity - 3.6.2 ?

With the bind path entries being processed in order, this indicates we are probably stuck on bind path = /nfs/ensemblgenomes/ftp then.

How is this mounted / what is the underlying fs? Is it automounted, are there any symlinks involved pointing to somewhere else?

When you run the singularity container and it gets stuck are there any messages in dmesg output?

Output of mount and cat /proc/self/mountinfo on the house might be useful. If there is anything sensitive you can censor it before attaching here.

mbhall88 commented 4 years ago

First - does this reproduce in the latest version of Singularity - 3.6.2 ?

We don't have that version on our cluster sorry. And I don't have the privileges to install it.

Regarding the rest of the questions, I will probably have to try and get the team that manages the cluster at my institute to try and answer these as I have no idea.

dtrudg commented 3 years ago

Closing this as we cannot reproduce, and there hasn't been any further information. Please feel free to reopen if you can try with the latest Singularity and provide other information requested above. Thanks.