easybuilders / easybuild-framework

EasyBuild is a software installation framework in Python that allows you to install software in a structured and robust way.
https://easybuild.io
GNU General Public License v2.0
146 stars 199 forks source link

EB "install" step doesn't seem to propagate group-sticky bit to subdirectories #3847

Open kelseymh opened 2 years ago

kelseymh commented 2 years ago

We install our package on a shared group disk, with our set of users all belonging to a common secondary Unix group (the HPC assigns each user to their own singular primary group, so chmod g+rw is redundant with u+rw). The group area is explicitly set with chmod g+s, and I have confirmed that the system (CentOS7) properly inherits the group sticky bit down when new subdirectories are created.

However, it appears that the EasyBuild "install" step overrides that, such that subdirectories end up without the group-sticky set, and worse, end up owned by the individual user's group, rather than the proper "group group." Here's an example after building and installing ROOT/6.24.00:

[kelsey@grace2 eb]$ ls -l x86_64/sw/ROOT/6.24.00-foss-2020b-Python-3.8.6
total 148
drwxrwsr-x  3 kelsey mitchcomp   4096 Sep 22 18:06 etc
drwxrwxr-x 11 kelsey mitchcomp 102400 Sep 23 14:39 include
drwxrwxr-x  6 kelsey mitchcomp   4096 Sep 23 14:39 js
drwxrwxr-x 10 kelsey mitchcomp  20480 Sep 23 14:39 lib
drwxrwxr-x  3 kelsey mitchcomp   4096 Sep 14 16:12 man
drwxrwxr-x  3 kelsey mitchcomp   4096 Sep 23 14:39 README
drwxrwxr-x 48 kelsey mitchcomp   4096 Sep 23 14:39 tutorials
drwxrwxr-x  8 kelsey mitchcomp   4096 Sep 14 16:12 ui5

Notice that the "etc" directory is sticky, but none of the other directories are. Then, inside one of those directories,

x86_64/sw/ROOT/6.24.00-foss-2020b-Python-3.8.6/js:
total 16
drwxrwxr-x 2 kelsey kelsey 4096 Sep 14 16:12 files
drwxrwxr-x 2 kelsey kelsey 4096 Sep 14 16:12 img
drwxrwxr-x 2 kelsey kelsey 4096 Sep 14 16:12 scripts
drwxrwxr-x 3 kelsey kelsey 4096 Sep 14 16:12 style

you see that the stuff is all owned by group "kelsey", not the group-group "mitchcomp."

I haven't figured out why this is happening, but it makes it very difficult to have multiple group users building and installing modules (which is a feature we want; we're a physics group, not cluster management). I believe that the permission bits are getting overwritten by EB; I wonder if there's a way to make sure the sticky bit isn't getting stomped on?

ocaisa commented 2 years ago

We do this at JSC, I think you want --set-gid-bit. See near https://github.com/easybuilders/JSC/blob/2020/dev_modules/Developers/InstallSoftware.lua#L241 for settings that might help you with this (we configured this a long time ago, I'm honestly not sure which setting does what!).

kelseymh commented 2 years ago

I see it! In the code, it's lines 253-254:

    -- We need to allow people to clean out an installation in Devel
    setenv("EASYBUILD_STICKY_BIT", "0")

So I presume the command line option would be "--sticky_bit=0" (I am always confused by the way umask-like things are inverted: the 0 means "don't not set this" :-( ).

kelseymh commented 2 years ago

I used the --sticky-bit option successfully, but it does not operate comprehensively. Specifically, I'm building ROOT/6.18.02-foss-2020b-Python-3.8.6 (our own local EB file, not in the EB repository). With --sticky-bit set, seven of the top-level subdirectories are properly set "g+rwxs", but the remaining nine are not:

[kelsey@grace2 eb]$ /bin/ls -l x86_64/sw/ROOT/6.18.02-foss-2020b-Python-3.8.6
total 208
drwxrwsr-x  2 kelsey mitchcomp  4096 Sep 23 21:48 bin
drwxrwsr-x  3 kelsey mitchcomp  4096 Sep 23 21:48 cmake
drwxrwsr-x  2 kelsey mitchcomp  4096 Sep 23 21:48 config
drwxrwsr-t  3 kelsey mitchcomp  4096 Sep 23 21:54 easybuild
drwxrwsr-x  3 kelsey mitchcomp  4096 Sep 23 21:48 emacs
drwxrwsr-x  9 kelsey mitchcomp  4096 Sep 23 21:48 etc
drwxrwxr-x  2 kelsey mitchcomp  4096 Sep 23 21:48 fonts
drwxrwsr-x  3 kelsey mitchcomp  4096 Sep 23 21:48 geom
drwxrwxr-x  2 kelsey mitchcomp 36864 Sep 23 21:48 icons
drwxrwxr-x 11 kelsey mitchcomp 94208 Sep 23 21:48 include
drwxrwxr-x  6 kelsey mitchcomp  4096 Sep 23 21:48 js
drwxrwxr-x  6 kelsey mitchcomp 20480 Sep 23 21:48 lib
lrwxrwxrwx  1 kelsey mitchcomp     3 Sep 23 21:48 lib64 -> lib
-rw-rw-r--  1 kelsey mitchcomp   847 Aug 23  2019 LICENSE
drwxrwxr-x  2 kelsey mitchcomp  4096 Sep 23 21:48 macros
drwxrwxr-x  3 kelsey mitchcomp  4096 Sep 23 21:48 man
drwxrwxr-x  3 kelsey mitchcomp  4096 Sep 23 21:48 README
drwxrwxr-x 49 kelsey mitchcomp  4096 Sep 23 21:48 tutorials

Is this something that could be expanded in the EB framework ("make every install directory sticky, not just selected ones")? Is it due to some peculiarity in the root.py EasyBlock? Or should we just amend our local build script to take care of stickifying the install area when EB completes successfully?

ocaisa commented 2 years ago

It was always confusing to me too but you actually need --set-gid-bit, go take a read of https://linuxconfig.org/how-to-use-special-permissions-the-setuid-setgid-and-sticky-bits

Basically, sticky means only owner can rename, which hurts group installations as you can't reinstall something someone else installed. It's the gid bit that propagates the group.

kelseymh commented 2 years ago

:facepalm: Thank you for that link! Long, long ago, I picked up the phrasing "sticky bit" to refer specifically to the g+s setting on directories. It has always made sense to me, as it makes the group ownership "sticky", attaching itself to all the stuff within the directory.

I haven't done enough stuff in "root-ish" areas to realize that the same setting was used for "setgid" executables!

I'll bet this is why my latest test build ended up with a bunch of directories with "t" in the mode string, which I had never seen before. I'll going to fix the option flag and see what happens.

ocaisa commented 2 years ago

Snap on that, "sticky" group always made sense to me too...everytime this comes up I have to go re-read that stuff to make sure I have the ideas straight

ocaisa commented 2 years ago

@kelseymh If this is working for you now, can you close this issue?

kelseymh commented 2 years ago

It doesn't work completely. With the ROOT build, the "--set-gid-bit" option affects only some of the directories, not all of them. I had to use a combination of chmod -R g+ws and chown -R in my EB wrapper to take care of the full set of directories in the install area.

ocaisa commented 2 years ago

I took a look at the ROOT easyblock and I don't see anything there that could cause that. It looks to me like it's just using CMake for the install step which I would've imagined would respect those permissions.

ocaisa commented 2 years ago

Having said that, the easyconfig you're using is not in the main repo...maybe you have something unusual in there?

kelseymh commented 2 years ago

I took a look at the ROOT easyblock and I don't see anything there that could cause that. It looks to me like it's just using CMake for the install step which I would've imagined would respect those permissions.

Yup, that's what I see as well; I had hoped it was doing something special that I could fix, but there you go :-/

Having said that, the easyconfig you're using is not in the main repo...maybe you have something unusual in there?

You're right that the .eb file isn't in the repository. We've created several of our own group-level easyconfigs in order have the correct combination of version, toolchain, and in some cases CMake options to support our group's software needs. In this case, our local ROOT-6.18.02-foss-2020b-Python-3.8.6.eb contains only dependencies and configopts with a long list of "-Dblah=blah" CMake options. [Edited: specifically, just CMake options to choose different package supports, like FFTW, MySql, OpenGL, etc.]

I've got a workaround in place so we can have multiple users doing builds, but I will try to see if I can find where this is hiccuping.

ocaisa commented 2 years ago

So I looked a bit into this on the EB side. What is happening (I think) is that EB is using the set-gid-bit setting for any directory that it creates. Any installation therefore will pick up the group at least at the top level of the installation, but the bits will not be set to propagate that to the next level of folders.

I could imagine that setting --umask="2002" might help with that since this would include the sticky group setting...but I'm not sure if that only gets used when the permissions get adjusted at the end of an installation which is too late as the group will already have been set.

ocaisa commented 2 years ago

And I see that won't work, we are not set up to take anything other than a 3 digit umask: https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/tools/options.py#L835

but if we did it might indeed help since the setting happens early: https://github.com/easybuilders/easybuild-framework/blob/develop/easybuild/tools/options.py#L1494

ocaisa commented 2 years ago

Hmm, a little experiment tells me that you can't set the gid bit in your umask:

alanc@~$ umask
0002
alanc@~$ umask 022
alanc@~$ umask
0022
alanc@~$ umask 2002
alanc@~$ umask
0002

which is probably why we only accept 3 digit values.

ocaisa commented 2 years ago

Ok, ROOT itself is the culprit, it is explicitly setting the permissions of the installation: https://github.com/root-project/root/blob/master/main/CMakeLists.txt#L57

Default behaviour would be that the group would indeed propagate as expected/intended (as long as you always use copying and not moving).

akesandgren commented 2 years ago

Don't forget that umask removes bits ... so using umask 2002 wouldn't have done anything here...

ocaisa commented 2 years ago

Ah, of course, thanks! Detailed explanation at https://stackoverflow.com/questions/19489616/umask-for-extra-permissions-like-set-group-id-on-directories for why this can't work (and the existing approach should work)