Closed fnaum closed 1 week ago
There is more info in this Slack thread But here are some findings:
The exact same repro works in CentOS-7.8/singularity version 3.8.0-1.el7. We only discover this as we are moving to rocky-9.2
Initially I thought it could be related to something I found in the release notes:
Disable the usage of cgroup in instance creation when hidepid mount option on /proc is set.
I build the previous version 1.2.4 from source (plain ./mconfig -p /usr/local) and it worked. But as @DrDaveD pointed out building 1.2.5 from source also works
Tried installing 1.2.4 using dnf and it also exhibits the same issue.
Just recently discover that if I ssh into the same box it also works
[19:00 federico@va1wse01XX: ~] 1008 $ apptainer exec oras://artifactory.XXXXXX:XXX/singularity-toolchain/centos-devel:7.8.2003 python -c "import os; print(os.getlogin())"
Traceback (most recent call last):
File "<string>", line 1, in <module>
OSError: [Errno 2] No such file or directory
But if I ssh to the same machine it works
[19:00 federico@va1wse01XX: ~] 1009 $ ssh federicon@va1wse01XX
Last login: Mon Jul 8 18:58:25 2024 from va1wse01XX.van.animallogic.ca
[19:01 federicon@va1wse01XX: ~] 1000 $ apptainer exec oras://artifactory.XXXXX:XXX/singularity-toolchain/centos-devel:7.8.2003 python -c "import os; print(os.getlogin())"
INFO: Using cached SIF image
federico
In other words, on rocky-9 /apptainer-1.2.5 (or 1.3.2) works in interactive-login
shell but it does not work on interactive-non-login
shells
In centos-7.8 singularity version 3.8.0-1.el7, it works in in both cases.
I will try to dig in the on our ~/.bash_profile or ~/.bash_login or ~/.profile to see if I can find something else, but as @DrDaveD was able to reproduce this, I will appreciate someone shading some light I will love it is just configuration switch that I can turn in order to get the same behavior as we had in CentOS
I tracked down the difference between the version installed with dnf and with the one built from source, and it was a something we changed a while ago in the configuration
# CONFIG PASSWD: [BOOL]
# DEFAULT: yes
# If /etc/passwd exists within the container, this will automatically append
# an entry for the calling user.
config passwd = no
Changing this back to the default config passwd = yes
makes it work for us in interactive-login
and interactive-non-login
shells and we do not observe the reported issue anymore
We will revert the change because we only changed that for a test case that was checking that /etc/passwd
was NOT writable
but I will leave this issue opened to understand this issue more as the behavior in singularity 3.8.0-1.el7
is different, that is everything works with config passwd = no
I just built singularity-3.8.0 from source and get the same error when I set config passwd = no
.
We found a similar issue on singularity 3.8.0-1.el7
with config passwd = no
Repro
> touch $HOME/bla
> singularity exec oras://artifactory.XXXXX:XXXX/singularity-toolchain/centos-devel:7.8.2003 python -c"import os;import pwd; print(pwd.getpwuid(os.stat(os.path.expandvars('$HOME/bla')).st_uid))"
KeyError: 'getpwuid(): uid not found: 9426'
That also happens in apptainer-1.3.2
In this case the behaviour is consistent between singularity and apptainer
The behavior is really the kernel and whether or not it makes the relevant info available under /proc. I tried it on an el8 kernel, but perhaps older kernel behaviors are slightly different. There's not really anything that apptainer can do about it. Are you ready to close the issue?
Sorry for the delay in the reply.
We have "solved" our pressing issue but just reverting the config to config passwd = yes
I'm happy to close the issue but I left it open to see if someone could explain or shed some light on what I observed.
I would like to know what else changes when that config is changed from yes
to no
besides the /etc/paswd
file becoming not writable.
My observation when we have config passwd = no
was that I could see the file and the contents of /proc/self/loginuid
inside the container but then when Python tries to access that file we see that the file does not exist.
>strace python -c "import os; print(os.getlogin())"
...
open("/proc/self/loginuid", O_RDONLY) = -1 ENOENT (No such file or directory)
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
So I'm wondering what changes are done at /proc
level because the config name affecting /proc is not intuitive.
if you do not have an answer to those questions, I'm also happy to close it.
Whatever's happening in /proc
is outside of Apptainer's control, so I don't have an answer for you.
Thanks for your help anyways!
Version of Apptainer
appatiner-1.2.5-1.el9
Expected behavior
Actual behavior
Steps to reproduce this behavior
What OS/distro are you running
How did you install Apptainer
dnf module for Ansible, dnf install apptainer-1.2.5 should be the equivant
(On CentOS-78 we install singularity using yum)