Project-OSmOSE / OSEkit

OSEkit is an open source suite of tools written in python and dedicated to the management and analysis of data in underwater passive acoustics.
https://osmose.ifremer.fr
Other
4 stars 2 forks source link

umask is set arbitrarily within the codebase #223

Open Gautzilla opened 1 week ago

Gautzilla commented 1 week ago

The OSmOSE.utils.core_utils.set_umask() function is called at several places within the package, without apparent checks of when it should be set or reset.

def set_umask():
    os.umask(0o002)

To my opinion, correctly setting the umask should be the user's responsibility, and I would be tempted to remove all the calls to set_umask from the codebase. (The user could still execute os.umask(0o002) prior to initializing datasets or whatever to ensure the folders/files have write permissions within the group).

The main reason for this is that when we set the umask, we set it for the entire process, which could interfere with other packages or whatever the user's doing while running OSEkit (particularly since we change the mask under the hood without the user knowing).

We could set the umask right on OSmOSE import (that is in the package __init__). This suffers the same problems as before (we set it under the hood for the whole process duration), but at least we only call it once and for all. It isn't elegant, but it's a small upgrade.

Further on, I tried to modify the set_umask() function to be used as a context manager:

@contextmanager
def set_umask() -> None:
    """Temporarily set the umask to 0o002."""
    old_umask = os.umask(0o002)
    try:
        yield
    finally:
        os.umask(old_umask)

This allows to set the umask to 0o002 only for the lifespan of the called function or method:

@set_umask()
def build()
    ... # Do things with umask set to 0o002

# umask reset to its prior value

or to be used in a with statement:

with set_umask():
    ... # Do things with umask set to 0o002

# umask reset to its prior value

But since OSEkit writes lots of files and folders from almost everywhere , it's a bit messy to do it this way (we'll end up putting the decorator almost everywhere and do multiple redundant calls to the context manager).

@cazaudo @mathieudpnt do you have an opinion on this? I'd recommend getting rid of the umask modifications within OSEkit to let the responsability to the user (and add to the datarmor-toolkit notebooks a line that changes the umask for the session duration, with comments on what it means), but if you find it too complicated, I'm all ears!

ElodieENSTA commented 1 week ago

Just keep in mind APLOSE need public read and execution access on folder and public read on files (pretty sure execution is not needed) to behave properly. So if there is common APLOSE process, it would be nice to have somewhere a method to update these permissions by default (maybe with a user confirmation to avoir mistakes 🤔).

Gautzilla commented 1 week ago

Sure, OSEkit will still create files and folders with the default 0o664 and 0o775 permissions, respectively. That gives what you just said, plus the write permissions for team members also.

However, if for some reason a user wants to use a umask that removes the write permission for their group, it should (to my opinion) be their responsibility to do so:

os.umask(0o022)
dataset.build() # the write permissions will be revoked within the build method
# Setting the umask to 0o002 within the build method (as we do for now) will overrule the user's choice

Since the default Linux umask is 0o022 and that most OSEkit users might want their files and folder to have write permissions within their group, I'd still change the umask in the notebooks (with proper documentation):

os.umask(0o002) # This umask gives writing permissions to users of your group.
# Do OSmOSE stuff

But overall I still think that while the modes with which we write the files and folders is our responsibility, that's not the case for the umasks! 🥸

Gautzilla commented 1 week ago

Just to clarify the current use and how I'd see the change:

What happens in the current state:

import os

old_umask = os.umask(0o002)
os.umask(old_umask) # These lines are only here to print the current umask, they are not part of the OSEkit usage
print(f"umask before running OSmOSE: 0o{old_umask:0>3o}")
# > umask before running OSmOSE: 0o022

from OSmOSE import Dataset

dataset = Dataset(...)
dataset.build(...)

"""
    2024-11-06 09:20:11,585 - INFO 
DONE ! your dataset is on OSmOSE platform !
"""

new_umask = os.umask(0o002)
os.umask(new_umask) # These lines are only here to print the current umask, they are not part of the OSEkit usage
print(f"umask after running OSmOSE: 0o{new_umask:0>3o}")
# > umask after running OSmOSE: 0o002
# umask has been arbitrarily changed with no possible choice from the user

What I think the usage should look like:

import os
from OSmOSE import Dataset

default_umask = os.umask(0o002) # User self-defines the umask (and stores the old one if needed)

dataset = Dataset(...)
dataset.build(...)

"""
    2024-11-06 09:20:11,585 - INFO 
DONE ! your dataset is on OSmOSE platform !
"""

os.umask(default_umask) # User can reset the umask if needed after using OSEkit functions.
# This also offers the possibility to use OSEkit functions with different umasks.