LDMX-Software / ldmx-sw

The Light Dark Matter eXperiment simulation and reconstruction framework.
https://ldmx-software.github.io
GNU General Public License v3.0
20 stars 16 forks source link

use (but don't require) denv in ldmx env #1269

Open tomeichlersmith opened 2 months ago

tomeichlersmith commented 2 months ago

I am updating ldmx-sw, here are the details.

What are the issues that this addresses?

This resolves #1232 directly, resolves #1248 directly, and resolves #1252 indirectly. I do this by moving the complicated container interaction scripting into its own project such that it can be more directly tested and version controlled. The environment scripts here are then just using this other shell program to help replicate the current environment.

The long-term benefit of isolating the container interaction into its own program amounts to allowing users to do the same process we do with ldmx-sw with any of their software. Below I have two examples that I do regularly and one idea that I could see becoming commonplace if we adopt denv within LDMX.

Pin ldmx-sw Version for Production

I can use denv to choose an ldmx-sw version and use it pre-compiled. It is pinned and is not cluttered by my developments happening elsewhere on my system.

denv init ldmx/pro:v3.3.3
denv fire my-config.py
# runs with ldmx-sw v3.3.3

Pin Python and Packages

I often want to develop a python analysis to share with someone else and it is very helpful to be using the same python version and the same version of all the packages. pip and venv offer a good way to pin the python package versions, but often times it is difficult to align the python version across systems. Luckily for us, python builds container images that we can use directly with denv.

# choose a python version
denv init python:3.11
# install the requirements of the analysis into this denv's home directory
denv python3 -m pip install -r requirements.txt
# run your analysis or whatever
denv python3 analysis.py

Sharing Environments

In some cases, the workspace that I use with denv (the directory where I ran denv init) is also a git repository. In these cases, I can commit the config written by denv so that others who want to replicate the work I did with that code can simply

git clone <repo>
cd <repo>
denv <command>

Requested Feedback

Note that I am not removing the current ldmx-env.sh script. It has many peculiarities that we have gotten used to and I have not replicated all of them with denv. This was intentional since many of these peculiarities are born out of just trying to get it to work and I have now figured out a better solution. This is a first PR to make using denv an available path for people looking to develop ldmx-sw. I wish to have reviews of this PR focus on the following

  1. Whether this is a good idea. There was good discussion in #1232 but I want to re-open the discussion with this more concrete proposal. I also want to point out that I am not committed to the draft roadmap I wrote in that issue, it was just me dumping my thoughts onto GitHub.
  2. Ease the transition (If this is a good idea). How can we design these files to help ease the transition from the current set of bash functions (a strategy called ldmx here as shorthand) to more simply wrapping another program (strategy called denv here as shorthand). Even if we stay with a bash environment script with bash functions wrapping denv indefinitely, there are still some aspects of the workflow that will change.
  3. Feature proposals (If this is a good idea). If we are already transitioning the workflow to a slightly different form, I think it is a good time to try to accommodate other features that people might enjoy. I'm happy to discuss ideas to see what may be helpful.

To Do

Context

To provide further context. The ldmx-denv-init.sh file just initilizes denv trying to be POSIX-sh compliant so that it could be used in other POSIX shells. The ldmx-denv-transition.sh file is a bash-specific environment file that wraps denv with various bash functions to try to mimic the behavior of the current ldmx command. Below, I have these translations written out in a more human readable form.

ldmx Command denv Equivalent
help help
config config print
use config image <image>
pull config image pull
mount config mounts <dir>
setenv config env copy
run removed
checkout removed
source removed
list removed
clean removed

Explanations for Removals

All of the ldmx commands that do not have a denv equivalent are not included within denv because they do not directly pertain to interfacing with the container runner. I also am proposing removing them entirely from our ecosystem largely because they are either (a) broken, (b) not used widely, or (c) both.

EinarElen commented 2 months ago

Right, I've tried to look at this as if I was a completely new user First thing I tried:

The reason for this is of course that i haven't run denv init yet. I would appreciate if the new scripts would check

  1. Is denv installed? If not, offer to do it y/n
  2. Do we have a workspace initialized? If not, offer to do it y/n. You could offer a choice between ldmx/pro and ldmx/dev

It looks like this is what ldmx-denv-init tries to do but it doesn't work for me at least :(

Very minor: Maybe it is time to patch the typo in: You're default shell '/bin/zsh' isn't bash.

tomeichlersmith commented 2 months ago

Thank you for giving this a try Einar!! :1st_place_medal:

I have employed shellcheck to start helping debug these scripts and I've already found many patches to make. One of the issues shellcheck found seems to be the primary reason the ldmx-denv-init.sh file is failing for your zsh. I'm frankly surprised it didn't work on bash for you since it does work on my bash but I haven't setup a test VM for play around with different configurations yet.

tom@zuko:~/code/ldmx/ldmx-sw/scripts$ shellcheck ldmx-denv-init.sh 

In ldmx-denv-init.sh line 71:
_default_denv_workspace="$(dirname "${BASH_SOURCE[0]}" )/../../"
                                    ^---------------^ SC3028 (warning): In POSIX sh, BASH_SOURCE is undefined.
                                    ^---------------^ SC3054 (warning): In POSIX sh, array references are undefined.

For more information:
  https://www.shellcheck.net/wiki/SC3028 -- In POSIX sh, BASH_SOURCE is undef...
  https://www.shellcheck.net/wiki/SC3054 -- In POSIX sh, array references are...

Unfortunately, finding the path to the current source file in a POSIX-sh compliant way is hard (or impossible when sourcing a file rather than executing it?) so I'm investigating solutions. We might need to abandon a POSIX init script in favor of documentation if there isn't another way to deduce LDMX_BASE.

tvami commented 2 months ago

I did the following on UCSB's cluster which uses bash:

module load singularity
git clone https://github.com/LDMX-Software/ldmx-sw.git
git checkout 1232-use-denv-in-ldmx-env
denv --clean-env  --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"

leads to

ERROR: Unable to deduce a denv workspace!
       Are you in a denv workspace? Do you still need to 'denv init'?

Then I run

 .  scripts/ldmx-denv-init.sh

which leads to

ERROR: denv requires the --env flag for singularity which was introduced in v3.6
ldmx/dev:latest
INFO:    Starting build...
FATAL:   While performing build: conveyor failed to get: Error initializing source oci:/home/vamitamas/.singularity/cache/oci:a87552b8a96913893b47298ff754c5b52157d585edbc5c8e016033bc9c3e4d07: Error initializing image from source docker://ldmx/dev:latest: unsupported schema version 2

(now same thing happens with denv --clean-env --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}")

The singularity version is 3.5.2, so I guess this not working out of the box is expected.

tomeichlersmith commented 2 months ago

:raised_hands: thank you @tvami :raised_hands:

There are two separate "issues" that your comment brings up.

singularity version

UCSB also has apptainer in a separate module (which is the new name after it was adopted by the Linux Foundation), so you should be doing

module load apptainer

to prefer the newer version. The last error you see after . scripts/ldmx-denv-init.sh is related to the old version of singularity. I do not plan on supporting the older singularity versions and instead point people towards the newer apptainer (all clusters LDMX uses have access to apptainer: https://github.com/LDMX-Software/ldmx-sw/issues/1248#issuecomment-1892395583)

init typo

The first error you see from running

denv --clean-env  --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"

is because there is a typo in the README I wrote (now fixed). It should be

denv init --clean-env  --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"

In this call, we are using LDMX_BASE as a pre-defined environment variable that stores the path to LDMX_BASE. If printenv LDMX_BASE returns a path, then you are all good! If not, you can just use a relative path when running this directly. For example, from the ldmx-sw repository

denv init --clean-env --name ldmx ldmx/dev:latest ..
tvami commented 2 months ago

OK this worked now for the environment, however,

denv fire .github/validation_samples/ecal_pn/config.py 
sh: 1: fire: not found

Doing

denv init ldmx/pro:v3.3.3
denv fire .github/validation_samples/ecal_pn/config.py 

then leads to

---- LDMXSW: Loading configuration --------
Traceback (most recent call last):
  File "/home/vamitamas/Testing1269/ldmx-sw/.github/validation_samples/ecal_pn/config.py", line 1, in <module>
    from LDMX.Framework import ldmxcfg
ModuleNotFoundError: No module named 'LDMX'
Configuration Error [ConfigureError] : Problem loading python script
  at /code/Framework/src/Framework/ConfigurePython.cxx:315 in ConfigurePython
Stack trace: 
    0 /usr/local/lib/libFramework.so(+0x683d4) [0x7fe56f5df3d4]
    1 main + 280 addr2line: 'fire': No such file

    2 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fe56ebded90]
Configuration Error [ConfigureError] : Problem loading python script
  at /code/Framework/src/Framework/ConfigurePython.cxx:315 in ConfigurePython

One minor technical comment I'll have is: now this adds the .denv/ directory and that comes up in git status, I'd suggest to edit the .gitignore file and add that to the list to be ignored.

On the philosophical side, why not get rid off the ldmx env fully, supporting both seems to be some overhead in the maintenance. (I'm sorry if this was discussed and I just didnt follow this close enough). Although ldmx fire does sounds more ldmx-y than denv fire, maybe we could have an ldmxFire built-in alias that runs denv fire? This is truly a cosmetics and sentimental comment tho, feel free to ignore it.

jmmans commented 2 months ago

While I appreciate the idea of a cleanup, and it makes sense to be able to share useful functionality, I feel a little nervous about making the core LDMX software dependent on a package which is owned/maintained by a specific individual. Does this make sense in a long-term manner?

tomeichlersmith commented 2 months ago

@tvami I think both of your errors make sense in the context that the denv-run container does not run the container's entrypoint. Specifically, this means the PATH is not updated with ${LDMX_BASE}/ldmx-sw/install/bin (in the first case) and the PYTHONPATH is not updated with /usr/local/python (in the second case).

As far as the location of the .denv directory, I intend it to be located in ${LDMX_BASE} (i.e. in the directory above ldmx-sw); however, it was created within ldmx-sw when you ran denv init within ldmx-sw. (If you just want to change the image, you should use denv config image <tag>.) I am unsure on how to enforce this preference so thoughts along these lines are welcome.

I agree with you that supporting both ldmx and denv would be more maintenance. To be clear, I think the long term workflow would be to move ldmx to just be calling denv (like is being done in ldmx-denv-transition.sh) but even in that case, I fear it is too much. Again, I am unsure on how to meet my dueling goals of easy transition and long term maintainability.


@jmmans I think your concern is well founded. I've been trying to think about the long term sustainability of this project especially if LDMX adopts it. I actually presented this project to the Hep Software Foundation last September where I received good feedback but I was unsure on the next steps. Perhaps I reopen discussions with them about transitioning ownership of denv to HSF? Another obvious option is for me to transfer ownership to LDMX-Software or have LDMX-Software keep a fork.

tomeichlersmith commented 2 months ago

In order to get running functional, we (currently) need to edit the .profile file copied into ${LDMX_BASE} by denv. This file is copied from the /etc/skel directory of the image so we can avoid requiring this by updating the skeleton file in the image: https://github.com/LDMX-Software/docker/issues/94

After the first attempt to run denv after it has been init'ed, it copies the skeleton files into the base directory.

tom@nixos:~/code/ldmx$ ls -A
.bash_history  .bash_logout  .bashrc  .denv  ldmx-sw  .profile

We then are free to update them as we see fit since they are only copied into place if they don't exist. I wrote the following into the .profile file above copying the code from our current image's entrypoint script and changing LDMX_BASE to HOME. The LDMX_SW_INSTALL environment variable is defined by the production image when it is being built so we check if that is defined. If it isn't, then we define it to be ~/ldmx-sw/install since the denv workspace (what we call LDMX_BASE) is defined as the HOME directory inside the container. (This basically makes defining the LDMX_BASE environment variable redundant when using denv. It is more familiar terminology than "denv workspace" though so I am keeping it around)

if [ -z "${LDMX_SW_INSTALL}" ]; then 
  export LDMX_SW_INSTALL=$HOME/ldmx-sw/install 
fi 
export LD_LIBRARY_PATH=$LDMX_SW_INSTALL/lib:$LD_LIBRARY_PATH 
export PYTHONPATH=$LDMX_SW_INSTALL/python:$LDMX_SW_INSTALL/lib:$PYTHONPATH 
export PATH=$LDMX_SW_INSTALL/bin:$PATH 

#add what we need for GENIE  
export LD_LIBRARY_PATH=$GENIE/lib:/usr/local/pythia6:$LD_LIBRARY_PATH 
export PATH=$GENIE/bin:$PATH 

# add externals installed along side ldmx-sw 
# WARNING: No check to see if there is anything in this directory 
for _external_path in $LDMX_SW_INSTALL/external/*/lib 
do 
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$_external_path 
done 
unset _external_path 

This extra shell code updates the necessary environment variables in the container so that fire is found in PATH and the config python modules are found in PYTHONPATH.

tom@nixos:~/code/ldmx/ldmx-sw$ denv fire SimCore/test/basic.py                                                                                                                                                                                
---- LDMXSW: Loading configuration --------                                                                                                                                                                                                   
---- LDMXSW: Configuration load complete  --------                                                                                                                                                                                            
---- LDMXSW: Starting event processing --------  
<omitted running output for brevity>
[ Simulator ] : Started 10 events to produce 10 events.
---- LDMXSW: Event processing complete  --------

The .profile updated with the shell code above also works with production images.

tom@nixos:~/code/ldmx/ldmx-sw$ denv config image ldmx/pro:latest
<omit image downloading printouts for brevity>
tom@nixos:~/code/ldmx/ldmx-sw$ denv which fire
/usr/local/bin/fire
tom@nixos:~/code/ldmx/ldmx-sw$ denv fire SimCore/test/basic.py                                                                                                                                                                                
---- LDMXSW: Loading configuration --------                                                                                                                                                                                                   
---- LDMXSW: Configuration load complete  --------                                                                                                                                                                                            
---- LDMXSW: Starting event processing --------  
<omitted running output for brevity>
[ Simulator ] : Started 10 events to produce 10 events.
---- LDMXSW: Event processing complete  --------
tomeichlersmith commented 1 month ago

I'm going to delay this until after May 1st. It will be a good example for how to update a branch (and what the resulting commit history will look like) and I don't have time to polish this as much as I think is necessary for it to be ready to merge.

tomeichlersmith commented 1 month ago

This last force push was me following the online instructions for updating a branch from v3.4.0 to v4.0.0.