Closed zyndagj closed 6 years ago
@zyndagj Are you using the following patch : https://github.com/singularityware/singularity/issues/1214#issuecomment-361381310 ?
Based on previous issues, I did
for tFile in overlayfs.c dev.c sessiondir.c; do
fPath=$(find . -type f -name $tFile)
sed -i "s/tmpfs/ramfs/g" $fPath
done
Though I don't really understand the difference beyond a size limit that can be imposed on a tmpfs.
Also, my error did not lead to a kernel panic. Singularity just threw the error and then exited.
Sed is not enough. Your problem come from permissions on ramfs mount point, tmpfs automatically set permission to 1777 like a /tmp folder, so it's ok, but when switching to ramfs the mount point is not world writable and owned by root, that's why you got permission denied, note the line in the patch if (chmod(sessiondir, S_IRWXU|S_IRWXG|S_IRWXO) < 0)
, this code is required to work as expected.
Between you don't need to patch dev.c and overlay.c, sessiondir as tmpfs is the root cause of kernel panic on those systems
I applied the patch
diff --git a/src/util/sessiondir.c b/src/util/sessiondir.c
index 4253fef..c6cfb00 100644
--- a/src/util/sessiondir.c
+++ b/src/util/sessiondir.c
@@ -89,9 +89,13 @@ int singularity_sessiondir(void) {
ABORT(255);
}
- singularity_message(DEBUG, "Mounting sessiondir tmpfs: %s\n", sessiondir);
- if ( singularity_mount("tmpfs", sessiondir, "tmpfs", MS_NOSUID, sessiondir_size_str) < 0 ){
- singularity_message(ERROR, "Failed to mount sessiondir tmpfs %s: %s\n", sessiondir, strerror(errno));
+ singularity_message(DEBUG, "Mounting sessiondir ramfs: %s\n", sessiondir);
+ if ( singularity_mount("ramfs", sessiondir, "ramfs", MS_NOSUID, sessiondir_size_str) < 0 ){
+ singularity_message(ERROR, "Failed to mount sessiondir ramfs %s: %s\n", sessiondir, strerror(errno));
+ ABORT(255);
+ }
+ if (chmod(sessiondir, S_IRWXU|S_IRWXG|S_IRWXO) < 0){
+ singularity_message(ERROR, "Failed to set permission %s: %s\n", sessiondir, strerror(errno)); ABORT(255);
}
and received the new error
Enabling debugging
Ending argument loop
Singularity version: 2.4.5-dist
Exec'ing: /home1/03076/gzynda/apps/singularity/2.4.5/lib64/singularity/cli/shell.exec
Evaluating args: '/home1/03076/gzynda/singularity-ls5.img'
VERBOSE [U=0,P=13138] message_init() Set messagelevel to: 5
DEBUG [U=0,P=13138] fd_cleanup() Cleanup file descriptor table
VERBOSE [U=0,P=13138] singularity_config_parse() Initialize configuration file: /home1/03076/gzynda/apps/singularity/2.4.5/etc/singularity/singularity.conf
DEBUG [U=0,P=13138] singularity_config_parse() Starting parse of configuration file /home1/03076/gzynda/apps/singularity/2.4.5/etc/singularity/singularity.conf
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key allow setuid = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key max loop devices = '256'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key allow pid ns = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key config passwd = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key config group = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key config resolv_conf = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key mount proc = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key mount sys = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key mount dev = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key mount devpts = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key mount home = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key mount tmp = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key mount hostfs = 'no'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key bind path = '/etc/localtime'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key bind path = '/etc/hosts'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key bind path = '/gpfs'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key bind path = '/scratch'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key bind path = '/work'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key bind path = '/home1'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key user bind control = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key enable overlay = 'try'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key mount slave = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key sessiondir max size = '16'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key allow container squashfs = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key allow container extfs = 'yes'
VERBOSE [U=0,P=13138] singularity_config_parse() Got config key allow container dir = 'yes'
DEBUG [U=0,P=13138] singularity_config_parse() Finished parsing configuration file '/home1/03076/gzynda/apps/singularity/2.4.5/etc/singularity/singularity.conf'
VERBOSE [U=0,P=13138] singularity_registry_init() Initializing Singularity Registry
VERBOSE [U=0,P=13138] singularity_registry_set() Adding value to registry: 'LIBEXECDIR' = '/home1/03076/gzynda/apps/singularity/2.4.5/lib64'
DEBUG [U=0,P=13138] singularity_registry_set() Returning singularity_registry_set(libexecdir, /home1/03076/gzynda/apps/singularity/2.4.5/lib64) = 0
VERBOSE [U=0,P=13138] singularity_registry_set() Adding value to registry: 'COMMAND' = 'shell'
DEBUG [U=0,P=13138] singularity_registry_set() Returning singularity_registry_set(COMMAND, shell) = 0
VERBOSE [U=0,P=13138] singularity_registry_set() Adding value to registry: 'MESSAGELEVEL' = '5'
DEBUG [U=0,P=13138] singularity_registry_set() Returning singularity_registry_set(MESSAGELEVEL, 5) = 0
VERBOSE [U=0,P=13138] singularity_registry_set() Adding value to registry: 'VERSION' = '2.4.5-dist'
DEBUG [U=0,P=13138] singularity_registry_set() Returning singularity_registry_set(version, 2.4.5-dist) = 0
VERBOSE [U=0,P=13138] singularity_registry_set() Adding value to registry: 'LOCALSTATEDIR' = '/var'
DEBUG [U=0,P=13138] singularity_registry_set() Returning singularity_registry_set(localstatedir, /var) = 0
VERBOSE [U=0,P=13138] singularity_registry_set() Adding value to registry: 'SYSCONFDIR' = '/home1/03076/gzynda/apps/singularity/2.4.5/etc'
DEBUG [U=0,P=13138] singularity_registry_set() Returning singularity_registry_set(sysconfdir, /home1/03076/gzynda/apps/singularity/2.4.5/etc) = 0
VERBOSE [U=0,P=13138] singularity_registry_set() Adding value to registry: 'BINDIR' = '/home1/03076/gzynda/apps/singularity/2.4.5/bin'
DEBUG [U=0,P=13138] singularity_registry_set() Returning singularity_registry_set(bindir, /home1/03076/gzynda/apps/singularity/2.4.5/bin) = 0
VERBOSE [U=0,P=13138] singularity_registry_set() Adding value to registry: 'IMAGE' = '/home1/03076/gzynda/singularity-ls5.img'
DEBUG [U=0,P=13138] singularity_registry_set() Returning singularity_registry_set(IMAGE, /home1/03076/gzynda/singularity-ls5.img) = 0
DEBUG [U=0,P=13138] singularity_registry_get() Returning NULL on 'HOME'
DEBUG [U=0,P=13138] singularity_registry_get() Returning NULL on 'TARGET_UID'
DEBUG [U=0,P=13138] singularity_registry_get() Returning NULL on 'TARGET_GID'
DEBUG [U=0,P=13138] singularity_priv_init() Initializing user info
DEBUG [U=0,P=13138] singularity_priv_init() Set the calling user's username to: gzynda
DEBUG [U=0,P=13138] singularity_priv_init() Marking uinfo structure as ready
DEBUG [U=0,P=13138] singularity_priv_init() Obtaining home directory
VERBOSE [U=0,P=13138] singularity_priv_init() Set home (via getpwuid()) to: /home1/03076/gzynda
VERBOSE [U=0,P=13138] singularity_suid_init() Running SUID program workflow
VERBOSE [U=0,P=13138] singularity_suid_init() Checking program has appropriate permissions
VERBOSE [U=0,P=13138] singularity_suid_init() Checking configuration file is properly owned by root
VERBOSE [U=0,P=13138] singularity_suid_init() Checking if singularity.conf allows us to run as suid
DEBUG [U=0,P=13138] singularity_config_get_bool_char_impl() Called singularity_config_get_bool(allow setuid, yes)
DEBUG [U=0,P=13138] singularity_config_get_value_impl() Returning configuration value allow setuid='yes'
DEBUG [U=0,P=13138] singularity_config_get_bool_char_impl() Return singularity_config_get_bool(allow setuid, yes) = 1
DEBUG [U=0,P=13138] singularity_registry_get() Returning NULL on 'NOSUID'
VERBOSE [U=0,P=13138] singularity_priv_userns() Invoking the user namespace
DEBUG [U=0,P=13138] singularity_config_get_bool_char_impl() Called singularity_config_get_bool(allow user ns, yes)
DEBUG [U=0,P=13138] singularity_config_get_value_impl() No configuration entry found for 'allow user ns'; returning default value 'yes'
DEBUG [U=0,P=13138] singularity_config_get_bool_char_impl() Return singularity_config_get_bool(allow user ns, yes) = 1
VERBOSE [U=0,P=13138] singularity_priv_userns() Not virtualizing USER namespace: running as SUID
DEBUG [U=0,P=13138] singularity_priv_userns() Returning singularity_priv_init(void)
DEBUG [U=0,P=13138] singularity_priv_drop() Dropping privileges to UID=823749, GID=815499 (22 supplementary GIDs)
DEBUG [U=0,P=13138] singularity_priv_drop() Restoring supplementary groups
DEBUG [U=823749,P=13138] singularity_priv_drop() Confirming we have correct UID/GID
DEBUG [U=823749,P=13138] singularity_config_get_value_multi_impl() No configuration entry found for 'autofs bug path'; returning default value ''
VERBOSE [U=823749,P=13138] singularity_runtime_autofs() No autofs bug path in configuration, skipping
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'DAEMON_START'
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'DAEMON_JOIN'
DEBUG [U=823749,P=13138] singularity_daemon_init() Not joining a daemon, daemon join not set
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'WRITABLE'
VERBOSE [U=823749,P=13138] main() Instantiating read only container image object
DEBUG [U=823749,P=13138] singularity_registry_get() Returning value from registry: 'IMAGE' = '/home1/03076/gzynda/singularity-ls5.img'
DEBUG [U=823749,P=13138] singularity_image_init() Calling image_init for each file system module
DEBUG [U=823749,P=13138] singularity_image_dir_init() Opening file descriptor to directory: /home1/03076/gzynda/singularity-ls5.img
DEBUG [U=823749,P=13138] singularity_image_dir_init() This is not a directory based image
DEBUG [U=823749,P=13138] singularity_image_squashfs_init() Checking if writable image requested
DEBUG [U=823749,P=13138] singularity_image_squashfs_init() Opening file descriptor to image: /home1/03076/gzynda/singularity-ls5.img
VERBOSE [U=823749,P=13138] singularity_image_squashfs_init() Checking that file pointer is a Singularity image
DEBUG [U=823749,P=13138] singularity_image_squashfs_init() Checking for magic in the top of the file
VERBOSE [U=823749,P=13138] singularity_image_squashfs_init() File is not a valid SquashFS image
DEBUG [U=823749,P=13138] singularity_image_ext3_init() Opening file descriptor to image: /home1/03076/gzynda/singularity-ls5.img
VERBOSE [U=823749,P=13138] singularity_image_ext3_init() Checking that file pointer is a Singularity image
DEBUG [U=823749,P=13138] singularity_image_init() got image_init type for ext3
DEBUG [U=823749,P=13138] singularity_config_get_bool_char_impl() Called singularity_config_get_bool(allow container extfs, yes)
DEBUG [U=823749,P=13138] singularity_config_get_value_impl() Returning configuration value allow container extfs='yes'
DEBUG [U=823749,P=13138] singularity_config_get_bool_char_impl() Return singularity_config_get_bool(allow container extfs, yes) = 1
DEBUG [U=823749,P=13138] singularity_config_get_value_impl() No configuration entry found for 'limit container paths'; returning default value 'NULL'
DEBUG [U=823749,P=13138] singularity_config_get_value_impl() No configuration entry found for 'limit container owners'; returning default value 'NULL'
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'DAEMON_JOIN'
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'CLEANUPDIR'
VERBOSE [U=823749,P=13138] singularity_registry_set() Adding value to registry: 'CLEANUPD_FD' = '-1'
DEBUG [U=823749,P=13138] singularity_registry_set() Returning singularity_registry_set(CLEANUPD_FD, -1) = 0
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'DAEMON_JOIN'
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'NOSESSIONCLEANUP'
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'NOCLEANUP'
DEBUG [U=823749,P=13138] singularity_cleanupd() Not running a cleanup thread, no 'SINGULARITY_CLEANUPDIR' defined
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'DAEMON_JOIN'
DEBUG [U=823749,P=13138] singularity_runtime_ns() Calling: _singularity_runtime_ns_ipc()
DEBUG [U=823749,P=13138] singularity_config_get_bool_char_impl() Called singularity_config_get_bool(allow ipc ns, yes)
DEBUG [U=823749,P=13138] singularity_config_get_value_impl() No configuration entry found for 'allow ipc ns'; returning default value 'yes'
DEBUG [U=823749,P=13138] singularity_config_get_bool_char_impl() Return singularity_config_get_bool(allow ipc ns, yes) = 1
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'UNSHARE_IPC'
VERBOSE [U=823749,P=13138] singularity_runtime_ns_ipc() Not virtualizing IPC namespace on user request
DEBUG [U=823749,P=13138] singularity_runtime_ns() Calling: _singularity_runtime_ns_pid()
WARNING [U=823749,P=13138] singularity_runtime_ns_pid() Skipping PID namespace creation, support not available on host
DEBUG [U=823749,P=13138] singularity_runtime_ns() Calling: _singularity_runtime_ns_net()
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'UNSHARE_NET'
VERBOSE [U=823749,P=13138] singularity_runtime_ns_net() Not virtualizing network namespace on user request
DEBUG [U=823749,P=13138] singularity_runtime_ns() Calling: _singularity_runtime_ns_mnt()
DEBUG [U=823749,P=13138] singularity_config_get_bool_char_impl() Called singularity_config_get_bool(mount slave, yes)
DEBUG [U=823749,P=13138] singularity_config_get_value_impl() Returning configuration value mount slave='yes'
DEBUG [U=823749,P=13138] singularity_config_get_bool_char_impl() Return singularity_config_get_bool(mount slave, yes) = 1
DEBUG [U=823749,P=13138] singularity_priv_escalate() Temporarily escalating privileges (U=823749)
DEBUG [U=0,P=13138] singularity_priv_escalate() Clearing supplementary GIDs.
DEBUG [U=0,P=13138] singularity_runtime_ns_mnt() Virtualizing FS namespace
DEBUG [U=0,P=13138] singularity_runtime_ns_mnt() Virtualizing mount namespace
DEBUG [U=0,P=13138] singularity_priv_drop() Dropping privileges to UID=823749, GID=815499 (22 supplementary GIDs)
DEBUG [U=0,P=13138] singularity_priv_drop() Restoring supplementary groups
DEBUG [U=823749,P=13138] singularity_priv_drop() Confirming we have correct UID/GID
DEBUG [U=823749,P=13138] singularity_runtime_ns_mnt() Making mounts slave
DEBUG [U=823749,P=13138] singularity_registry_get() Returning NULL on 'DAEMON_JOIN'
DEBUG [U=823749,P=13138] singularity_sessiondir() Setting sessiondir
VERBOSE [U=823749,P=13138] singularity_sessiondir() Using session directory: /var/singularity/mnt/session
DEBUG [U=823749,P=13138] singularity_sessiondir() Checking for session directory: /var/singularity/mnt/session
DEBUG [U=823749,P=13138] singularity_sessiondir() Obtaining the default sessiondir size
DEBUG [U=823749,P=13138] singularity_config_get_value_impl() Returning configuration value sessiondir max size='16'
DEBUG [U=823749,P=13138] singularity_sessiondir() Converted sessiondir size to: 16
DEBUG [U=823749,P=13138] singularity_sessiondir() Creating the sessiondir size mount option length
DEBUG [U=823749,P=13138] singularity_sessiondir() Got size length of: 9
DEBUG [U=823749,P=13138] singularity_sessiondir() Creating the sessiondir size mount option string
DEBUG [U=823749,P=13138] singularity_sessiondir() Checking to make sure the string was allocated correctly
DEBUG [U=823749,P=13138] singularity_sessiondir() Mounting sessiondir ramfs: /var/singularity/mnt/session
ERROR [U=823749,P=13138] singularity_sessiondir() Failed to set permission /var/singularity/mnt/session: Operation not permitted
ABORT [U=823749,P=13138] singularity_sessiondir() Retval = 255
I also noticed two things:
sessiondir.c
Let me know if there is anything else I can try.
@zyndagj Yes because singularity_mount escalate privileges inside function, so you must escalate privileges for chown, the following snippet should work :
if ( singularity_mount("ramfs", sessiondir, "ramfs", MS_NOSUID, sessiondir_size_str) < 0 ){
singularity_message(ERROR, "Failed to mount sessiondir ramfs %s: %s\n", sessiondir, strerror(errno));
ABORT(255);
}
singularity_priv_escalate();
if (chmod(sessiondir, S_IRWXU|S_IRWXG|S_IRWXO) < 0){
singularity_message(ERROR, "Failed to set permission %s: %s\n", sessiondir, strerror(errno));
ABORT(255);
}
singularity_priv_drop();
Invoking a shell on an old Singularity 2.3.1 image worked!
I am going to run it through a gauntlet of tests tomorrow and then report back.
@cclerget, @zyndagj we are running singularity 2.4.5 with your last patch on our Cray XC50 (CLE 6.0) and it seems to work well. Is this something we can change via a configuration file (now or on a future version)?
Thanks! Miguel
@miguelgila To be sure the fix is correct, for CLE 6 did you apply the patch on src/util/sessiondir.c only ? or did you use sed too like in https://github.com/singularityware/singularity/issues/1417#issuecomment-375990586 to patch other files using tmpfs ?
@zyndagj Can you confirm too that only sessiondir patch is required for CLE 5 ?
@miguelgila Current release don't allow to switch between tmpfs and ramfs. I will submit a patch after you and @zyndagj confirm that issue lies only in sessiondir for CLE 5 and 6. A configuration directive like sessiondir type = tmpfs/ramfs
will be added to switch between tmpfs and ramfs
@cclerget I didn't apply it everywhere, which could explain why anytime we'd use overlayfs
nodes would crash badly... I'll apply it right away and report whether it works
@cclerget I've tested it and now we can have the following in singularity.conf
without nodes crashing:
mount dev = yes
enable overlay = try
Changes have been applied so new starting jobs should pick this up, if we see an increase in the job crash rate, I'll report it here.
Many thanks in advance for the sessiondir type
patch 👍
@miguelgila @zyndagj If you guys had a chance to test and give feedback about patch #1517, that would be greatly appreciated, thanks ! To get it :
$ git clone https://github.com/singularityware/singularity
$ git checkout -b cclerget-development-2.x-cray-panic-fix development-2.x
$ git pull https://github.com/cclerget/singularity.git development-2.x-cray-panic-fix
After installation, edit singularity.conf
to set memory fs type
to ramfs
@cclerget I should be able to test on Thursday after our cray comes back from maintenance.
@cclerget I just applied the patch file for 1517 and everything seems to have worked well. I could run our tests without any issues.
Just as an update. I have to wait for the next maintenance window because this development branch needs squashfs-tools on a system path to work.
Hi @zyndagj,
It should only be a warning and not an error during configure. Are you seeing it error somewhere?
When you try to do a build and it can't find mksquashfs then it should error.
Whenever I try to pull a new image it is unable to create an image because my squashfs install lives inside a traditional hpc module. It looks like the patch to support custom search paths for squashfs was merged into the 3.x development.
Yeah, I was editing my previous comment. I forgot we reset the PATH in the singularity
script.
Yup, it took me a little while to figure that out, because I sourced functions
and singularity_which mksquashfs
worked! After printing $PATH
from inside build.exec
, I noticed it was only looking in system locations.
My cluster is using singularity 2.5.1. There are a couple of docker images that I am trying to run in singularity. When I run these singularity errors with the following message
ERROR : Failed creating home directory in container /opt/singularity/singularity-2.5.1/var/singularity/mnt/final/home/username: Operation not supported
ABORT : Retval = 255
I have tested running a few singularity containers from singularityhub and I haven't had this error. Do you think it is likely that there is something wrong with the singularity install, or is it more likely that the docker images simply aren't fully compatible?
Reporting that creating and running images on the development-2x
branch does work on Cray systems. The only problem is that pulling layers fails due to the ancient ssl lib that Cray ships.
@zyndagj Good to hear! If you have any other problems or require help please open another issue and we'll work from there. Thanks!
Version of Singularity
2.4.5 + patch tmpfs -> ramfs mounts to support Cray systems
Expected behavior
Enter new shell inside container environment
Actual behavior
Steps to reproduce behavior
Additional comments
I am able to make those directories in userspace