Open tomeichlersmith opened 2 months ago
Right, I've tried to look at this as if I was a completely new user First thing I tried:
. scripts/ldmx-denv-init.sh
Gives
ERROR: Unable to deduce a denv workspace!
Are you in a denv workspace? Do you still need to 'denv init'?
ERROR: Unable to deduce a denv workspace!
Are you in a denv workspace? Do you still need to 'denv init'?
ERROR [ldmx-denv-init.sh]: 'denv' unable to find a supported container runner.
Install one of the container runners 'denv' checked for above.
Same on bash, so that's not the issue
The reason for this is of course that i haven't run denv init yet. I would appreciate if the new scripts would check
It looks like this is what ldmx-denv-init
tries to do but it doesn't work for me at least :(
Very minor:
Maybe it is time to patch the typo in: You're default shell '/bin/zsh' isn't bash.
Thank you for giving this a try Einar!! :1st_place_medal:
I have employed shellcheck to start helping debug these scripts and I've already found many patches to make. One of the issues shellcheck found seems to be the primary reason the ldmx-denv-init.sh
file is failing for your zsh. I'm frankly surprised it didn't work on bash for you since it does work on my bash but I haven't setup a test VM for play around with different configurations yet.
tom@zuko:~/code/ldmx/ldmx-sw/scripts$ shellcheck ldmx-denv-init.sh
In ldmx-denv-init.sh line 71:
_default_denv_workspace="$(dirname "${BASH_SOURCE[0]}" )/../../"
^---------------^ SC3028 (warning): In POSIX sh, BASH_SOURCE is undefined.
^---------------^ SC3054 (warning): In POSIX sh, array references are undefined.
For more information:
https://www.shellcheck.net/wiki/SC3028 -- In POSIX sh, BASH_SOURCE is undef...
https://www.shellcheck.net/wiki/SC3054 -- In POSIX sh, array references are...
Unfortunately, finding the path to the current source file in a POSIX-sh compliant way is hard (or impossible when sourcing a file rather than executing it?) so I'm investigating solutions. We might need to abandon a POSIX init script in favor of documentation if there isn't another way to deduce LDMX_BASE
.
I did the following on UCSB's cluster which uses bash
:
module load singularity
git clone https://github.com/LDMX-Software/ldmx-sw.git
git checkout 1232-use-denv-in-ldmx-env
denv --clean-env --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"
leads to
ERROR: Unable to deduce a denv workspace!
Are you in a denv workspace? Do you still need to 'denv init'?
Then I run
. scripts/ldmx-denv-init.sh
which leads to
ERROR: denv requires the --env flag for singularity which was introduced in v3.6
ldmx/dev:latest
INFO: Starting build...
FATAL: While performing build: conveyor failed to get: Error initializing source oci:/home/vamitamas/.singularity/cache/oci:a87552b8a96913893b47298ff754c5b52157d585edbc5c8e016033bc9c3e4d07: Error initializing image from source docker://ldmx/dev:latest: unsupported schema version 2
(now same thing happens with denv --clean-env --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"
)
The singularity version is 3.5.2, so I guess this not working out of the box is expected.
:raised_hands: thank you @tvami :raised_hands:
There are two separate "issues" that your comment brings up.
UCSB also has apptainer
in a separate module (which is the new name after it was adopted by the Linux Foundation), so you should be doing
module load apptainer
to prefer the newer version. The last error you see after . scripts/ldmx-denv-init.sh
is related to the old version of singularity. I do not plan on supporting the older singularity versions and instead point people towards the newer apptainer (all clusters LDMX uses have access to apptainer: https://github.com/LDMX-Software/ldmx-sw/issues/1248#issuecomment-1892395583)
The first error you see from running
denv --clean-env --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"
is because there is a typo in the README I wrote (now fixed). It should be
denv init --clean-env --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"
In this call, we are using LDMX_BASE
as a pre-defined environment variable that stores the path to LDMX_BASE
.
If printenv LDMX_BASE
returns a path, then you are all good! If not, you can just use a relative path when running this directly.
For example, from the ldmx-sw repository
denv init --clean-env --name ldmx ldmx/dev:latest ..
OK this worked now for the environment, however,
denv fire .github/validation_samples/ecal_pn/config.py
sh: 1: fire: not found
Doing
denv init ldmx/pro:v3.3.3
denv fire .github/validation_samples/ecal_pn/config.py
then leads to
---- LDMXSW: Loading configuration --------
Traceback (most recent call last):
File "/home/vamitamas/Testing1269/ldmx-sw/.github/validation_samples/ecal_pn/config.py", line 1, in <module>
from LDMX.Framework import ldmxcfg
ModuleNotFoundError: No module named 'LDMX'
Configuration Error [ConfigureError] : Problem loading python script
at /code/Framework/src/Framework/ConfigurePython.cxx:315 in ConfigurePython
Stack trace:
0 /usr/local/lib/libFramework.so(+0x683d4) [0x7fe56f5df3d4]
1 main + 280 addr2line: 'fire': No such file
2 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fe56ebded90]
Configuration Error [ConfigureError] : Problem loading python script
at /code/Framework/src/Framework/ConfigurePython.cxx:315 in ConfigurePython
One minor technical comment I'll have is: now this adds the .denv/
directory and that comes up in git status
, I'd suggest to edit the .gitignore
file and add that to the list to be ignored.
On the philosophical side, why not get rid off the ldmx
env fully, supporting both seems to be some overhead in the maintenance. (I'm sorry if this was discussed and I just didnt follow this close enough). Although ldmx fire
does sounds more ldmx-y than denv fire
, maybe we could have an ldmxFire
built-in alias that runs denv fire
? This is truly a cosmetics and sentimental comment tho, feel free to ignore it.
While I appreciate the idea of a cleanup, and it makes sense to be able to share useful functionality, I feel a little nervous about making the core LDMX software dependent on a package which is owned/maintained by a specific individual. Does this make sense in a long-term manner?
@tvami I think both of your errors make sense in the context that the denv
-run container does not run the container's entrypoint. Specifically, this means the PATH
is not updated with ${LDMX_BASE}/ldmx-sw/install/bin
(in the first case) and the PYTHONPATH
is not updated with /usr/local/python
(in the second case).
As far as the location of the .denv
directory, I intend it to be located in ${LDMX_BASE}
(i.e. in the directory above ldmx-sw); however, it was created within ldmx-sw when you ran denv init
within ldmx-sw. (If you just want to change the image, you should use denv config image <tag>
.) I am unsure on how to enforce this preference so thoughts along these lines are welcome.
I agree with you that supporting both ldmx
and denv
would be more maintenance. To be clear, I think the long term workflow would be to move ldmx
to just be calling denv
(like is being done in ldmx-denv-transition.sh
) but even in that case, I fear it is too much. Again, I am unsure on how to meet my dueling goals of easy transition and long term maintainability.
@jmmans I think your concern is well founded. I've been trying to think about the long term sustainability of this project especially if LDMX adopts it. I actually presented this project to the Hep Software Foundation last September where I received good feedback but I was unsure on the next steps. Perhaps I reopen discussions with them about transitioning ownership of denv to HSF? Another obvious option is for me to transfer ownership to LDMX-Software or have LDMX-Software keep a fork.
In order to get running functional, we (currently) need to edit the .profile
file copied into ${LDMX_BASE}
by denv. This file is copied from the /etc/skel
directory of the image so we can avoid requiring this by updating the skeleton file in the image: https://github.com/LDMX-Software/docker/issues/94
After the first attempt to run denv
after it has been init
'ed, it copies the skeleton files into the base directory.
tom@nixos:~/code/ldmx$ ls -A
.bash_history .bash_logout .bashrc .denv ldmx-sw .profile
We then are free to update them as we see fit since they are only copied into place if they don't exist. I wrote the following into the .profile
file above copying the code from our current image's entrypoint script and changing LDMX_BASE
to HOME
. The LDMX_SW_INSTALL
environment variable is defined by the production image when it is being built so we check if that is defined. If it isn't, then we define it to be ~/ldmx-sw/install
since the denv workspace (what we call LDMX_BASE
) is defined as the HOME
directory inside the container. (This basically makes defining the LDMX_BASE
environment variable redundant when using denv. It is more familiar terminology than "denv workspace" though so I am keeping it around)
if [ -z "${LDMX_SW_INSTALL}" ]; then
export LDMX_SW_INSTALL=$HOME/ldmx-sw/install
fi
export LD_LIBRARY_PATH=$LDMX_SW_INSTALL/lib:$LD_LIBRARY_PATH
export PYTHONPATH=$LDMX_SW_INSTALL/python:$LDMX_SW_INSTALL/lib:$PYTHONPATH
export PATH=$LDMX_SW_INSTALL/bin:$PATH
#add what we need for GENIE
export LD_LIBRARY_PATH=$GENIE/lib:/usr/local/pythia6:$LD_LIBRARY_PATH
export PATH=$GENIE/bin:$PATH
# add externals installed along side ldmx-sw
# WARNING: No check to see if there is anything in this directory
for _external_path in $LDMX_SW_INSTALL/external/*/lib
do
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$_external_path
done
unset _external_path
This extra shell code updates the necessary environment variables in the container so that fire
is found in PATH
and the config python modules are found in PYTHONPATH
.
tom@nixos:~/code/ldmx/ldmx-sw$ denv fire SimCore/test/basic.py
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete --------
---- LDMXSW: Starting event processing --------
<omitted running output for brevity>
[ Simulator ] : Started 10 events to produce 10 events.
---- LDMXSW: Event processing complete --------
The .profile
updated with the shell code above also works with production images.
tom@nixos:~/code/ldmx/ldmx-sw$ denv config image ldmx/pro:latest
<omit image downloading printouts for brevity>
tom@nixos:~/code/ldmx/ldmx-sw$ denv which fire
/usr/local/bin/fire
tom@nixos:~/code/ldmx/ldmx-sw$ denv fire SimCore/test/basic.py
---- LDMXSW: Loading configuration --------
---- LDMXSW: Configuration load complete --------
---- LDMXSW: Starting event processing --------
<omitted running output for brevity>
[ Simulator ] : Started 10 events to produce 10 events.
---- LDMXSW: Event processing complete --------
I'm going to delay this until after May 1st. It will be a good example for how to update a branch (and what the resulting commit history will look like) and I don't have time to polish this as much as I think is necessary for it to be ready to merge.
This last force push was me following the online instructions for updating a branch from v3.4.0 to v4.0.0.
I am updating ldmx-sw, here are the details.
What are the issues that this addresses?
This resolves #1232 directly, resolves #1248 directly, and resolves #1252 indirectly. I do this by moving the complicated container interaction scripting into its own project such that it can be more directly tested and version controlled. The environment scripts here are then just using this other shell program to help replicate the current environment.
The long-term benefit of isolating the container interaction into its own program amounts to allowing users to do the same process we do with ldmx-sw with any of their software. Below I have two examples that I do regularly and one idea that I could see becoming commonplace if we adopt
denv
within LDMX.Pin ldmx-sw Version for Production
I can use
denv
to choose an ldmx-sw version and use it pre-compiled. It is pinned and is not cluttered by my developments happening elsewhere on my system.Pin Python and Packages
I often want to develop a python analysis to share with someone else and it is very helpful to be using the same python version and the same version of all the packages.
pip
andvenv
offer a good way to pin the python package versions, but often times it is difficult to align the python version across systems. Luckily for us, python builds container images that we can use directly with denv.Sharing Environments
In some cases, the workspace that I use with
denv
(the directory where I randenv init
) is also agit
repository. In these cases, I can commit the config written by denv so that others who want to replicate the work I did with that code can simplyRequested Feedback
Note that I am not removing the current
ldmx-env.sh
script. It has many peculiarities that we have gotten used to and I have not replicated all of them withdenv
. This was intentional since many of these peculiarities are born out of just trying to get it to work and I have now figured out a better solution. This is a first PR to make usingdenv
an available path for people looking to develop ldmx-sw. I wish to have reviews of this PR focus on the followingldmx
here as shorthand) to more simply wrapping another program (strategy calleddenv
here as shorthand). Even if we stay with a bash environment script with bash functions wrappingdenv
indefinitely, there are still some aspects of the workflow that will change.To Do
ldmx-denv-transition.sh
)config
,use
,pull
,mount
,setenv
ldmx-sw/install
to$LDMX_BASE/.local
. This aligns withdenv
's design choice to make the workspace (for us,LDMX_BASE
) the home directory in the container which means the paths under~/.local
are often already included by the variousPATH
variables.~ Taken off list because we need to updatePYTHONPATH
no matter what and I like how easy it is for folks to delete the build and the install when trying to debug.PATH
variables in the${LDMX_BASE}/.profile
file. A template file is written bydenv
if it doesn't already exist. https://github.com/LDMX-Software/docker/issues/94 We need to do this even if we changed ldmx-sw's default installation to${LDMX_BASE}/.local
because we do not put our python config modules into a directory that is automatically searched by python (unless it is included inPYTHONPATH
).Context
To provide further context. The
ldmx-denv-init.sh
file just initilizes denv trying to be POSIX-sh compliant so that it could be used in other POSIX shells. Theldmx-denv-transition.sh
file is a bash-specific environment file that wraps denv with various bash functions to try to mimic the behavior of the currentldmx
command. Below, I have these translations written out in a more human readable form.ldmx
Commanddenv
Equivalenthelp
help
config
config print
use
config image <image>
pull
config image pull
mount
config mounts <dir>
setenv
config env copy
run
checkout
source
list
clean
Explanations for Removals
All of the
ldmx
commands that do not have adenv
equivalent are not included withindenv
because they do not directly pertain to interfacing with the container runner. I also am proposing removing them entirely from our ecosystem largely because they are either (a) broken, (b) not used widely, or (c) both.run
: In the original bash functions, this avoids the directory movement outside of the container and directly passes the arguments to the entrypoint script inside the container. I am removing this because it is overly complicated to replicate withdenv
in a robust way especially since the easiest way to replicate it is to tell users to justcd <dir>; denv <cmd>
.checkout
: this tries to code a more efficient combination ofgit checkout
andgit submodule update
so that users can more easily switch ldmx-sw branches; however, this is not functional. Moreover,git
provides a better description of how to switch between branches with different submodule commits and I think we should simply guide developers to this resource.source
: this is here to allow users to run multipleldmx
commands at once intending to have a single file defining the configuration of the environment conducted byldmx
. This command is not necessary indenv
sincedenv
stores the environment configuration within a text file at<workspace>/.denv/config
(for us,ldmx-sw/../.denv/config
since the workspace is the parent of ldmx-sw).list
: I tried to write some plain-shell code to list the available tags of the image. Since time of writing, DockerHub has changed its API breaking this code. Nobody else has noticed (or even if they have noticed, it wasn't a big enough deal to make an issue or post on slack), so I think removing it and just pointing people to the GitHub releases and DockerHub is good.clean
: This is the messiest command and has three sub-tasks.src
: Clean out all untracked files from the source file tree. Just a wrapper aroundgit clean
basically and with the moving of the generated files out fromFramework
and into thebuild
directory, something more complicated thatrm -r build install
is not necessary anymore.container
: Remove the downloaded container images. This is a dangerous action and I feel more comfortable instructing people on how to list the images already downloaded and remove them manually (either withdocker image rm
orrm
depending on the runner).env
: reset the environment variables.denv
does not store any important information in environment variables preferring instead to write and read the config file.