databio / bulker

Manager for multi-container computing environments
https://bulker.io
BSD 2-Clause "Simplified" License
24 stars 2 forks source link

Sharing a bulker config file across users #70

Closed rcorces closed 3 years ago

rcorces commented 3 years ago

Is it a terrible idea to share a bulker config file across all users of a server? I would love to have a centralized standard pipeline for all users where they dont have to set everything up individually.

Currently, the directory containing the config file has to have write permissions because bulker activate creates a lock_ file. This is fine by me but I'm wondering what problems might be caused by multiple users sharing all of these files? It seems like the lock file is super transient.

Same general concern with divvy which also uses the lock_ file.

nsheff commented 3 years ago

Is it a terrible idea to share a bulker config file across all users of a server?

No, that's a great idea and this is an intended use case. In fact this is what we do in my lab, for exactly the reason you state. I have a lab script that people can source in their .bashrc, and it sets environment variables for bulker and divvy, among other things, so that everyone can use the same bulker crates and we can share containers. it replaces the environment modules system on our server, which we don't have to deal with anymore, and have the advantage that it works across servers (so, I use the same crates on my local desktop, etc).

the lock file is in fact exactly the way we enabled bulker to work with the config file in a multi-user environment. It prevents one user from reading the file while another user is writing it -- by loading a crate, for example, which could yield a broken file if hit at just the wrong time. The lock file only lasts a split second, but prevents the race conditions if two users happen to try to do something at exactly the same time. Two "users" in this case can also be, for example, slurm jobs -- so the lock file ensures the fidelity of the system. We sometimes run a hundred jobs, and all of them are trying to look at the file at the same time...what if some lab user was updating a crate or something right when a hundred jobs were trying to read the file? some would fail. But the lock file prevents that, each job waits a split second to make sure it's reading a file that nobody else is writing.

The same system is used for both divvy and bulker (this functionality is actually encoded in yacman, which used for configuration file management behind bulker, divvy, and a number of our other tools.

So: please do make these things a shared central resource! it's built for that.

nsheff commented 3 years ago

you can also add your own lab manifests to bulker hub if you want.

rcorces commented 3 years ago

Perfect. Thats what I was hoping was the case. Thanks!

nsheff commented 3 years ago

@stolarczyk just got me thinking -- bulker shouldn't need to create a lockfile for bulker activate...@rcorces are you sure it's locking the file when you activate something and not when you load it?

stolarczyk commented 3 years ago

Not sure if that's the reason, but YacAttMap locks the file for reading, which I think we introduced in the last round of updates. It prevents reading files that are being written. This behavior can be turned off with 'skip_read_lock' flag in the constructor.

rcorces commented 3 years ago

@rcorces are you sure it's locking the file when you activate something and not when you load it?

Yes - definitely creating a lock_ file when calling bulker activate in our hands.

in case it is helpful:

bulker activate databio/pepatac:1.0.4

Error log: 
  File "/corces/home/fgrandi/tools/virtual_py/p3.8.5/bin/bulker", line 8, in <module>
    sys.exit(main())
  File "/corces/home/fgrandi/tools/virtual_py/p3.8.5/lib/python3.6/site-packages/bulker/bulker.py", line 914, in main
    bulker_config = yacman.YacAttMap(filepath=bulkercfg, writable=False)
  File "/corces/home/fgrandi/tools/virtual_py/p3.8.5/lib/python3.6/site-packages/yacman/yacman.py", line 76, in __init__
    create_lock(filepath, wait_max)
  File "/corces/home/fgrandi/tools/virtual_py/p3.8.5/lib/python3.6/site-packages/ubiquerg/files.py", line 241, in create_lock
    _create_lock(lock_path, filepath, wait_max)
  File "/corces/home/fgrandi/tools/virtual_py/p3.8.5/lib/python3.6/site-packages/ubiquerg/files.py", line 228, in _create_lock
    raise e
  File "/corces/home/fgrandi/tools/virtual_py/p3.8.5/lib/python3.6/site-packages/ubiquerg/files.py", line 210, in _create_lock
    create_file_racefree(lock_path)
  File "/corces/home/fgrandi/tools/virtual_py/p3.8.5/lib/python3.6/site-packages/ubiquerg/files.py", line 173, in create_file_racefree
    fd = os.open(file, write_lock_flags)
PermissionError: [Errno 13] Permission denied: '/corces/home/shared/tools/bulker/lock.bulker_config.yaml'
nsheff commented 3 years ago

YacAttMap should check the lock to read, but should it create a lock to read? At first I thought no, but then -- I guess it has to create the lock to prevent something from writing it while it's in the middle of reading it.

I guess maybe we'd need to make a bulker configuration option that would allow you to not lock for file reading, which would allow you to use it on a read-only file system by sacrificing the fidelity gain you get by preventing writing while another process is in the middle of reading.