LDMX-Software / ldmx-sw

The Light Dark Matter eXperiment simulation and reconstruction framework.
https://ldmx-software.github.io
GNU General Public License v3.0
22 stars 20 forks source link

support `/cvmfs/unpacked.cern.ch` in environment script #1252

Open tomeichlersmith opened 9 months ago

tomeichlersmith commented 9 months ago

Is your feature request related to a problem? Please describe. I don't like having to download copies of our images all over the place. This seems wasteful to me and there is already a solution.

Describe the solution you'd like If the environment script could run images from unpacked (apptainer/singularity support this already), then we can distribute our images there (as well as on docker hub) which would reduce image duplication and save disk space on our clusters.

Describe alternatives you've considered An alternative (I guess?) is to try to develop our own image caching system that (similar to CVMFS) would be local to each cluster. I'm not interested in the complexities of this since CVMFS already can do this.

Additional context I've gotten ldmx/dev:latest into /cvmfs/unpacked.cern.ch so it is already available. One could (with the current environment script) do

cd ${LDMX_BASE}
ln -s /cvmfs/unpacked.cern.ch/registry.hub.docker.com/ldmx/dev:latest ldmx_dev_latest.sif
. ldmx-sw/scripts/ldmx-env.sh

which should work (I think, untested).

This is somewhat related to #1232 since denv does support running unpacked images and so one could do the following with denv.

denv init /cvmfs/unpacked.cern.ch/registry.hub.docker.com/ldmx/dev:latest

and then use denv for compiling and running ldmx software (instead of ldmx).

bryngemark commented 9 months ago

Let's discuss this in a sw dev meeting so I understand the interplay of your suggestion with requiring that sites set up cvmfs and that paths are loaded etc.

tomeichlersmith commented 9 months ago

To answer this since we didn't get to it at the meeting...

My thought process would be (with or without using denv), using the CVMFS hosted images would be optional and not required. I think the best way to explain this would be with some pseudo-code below.

# on a computer with apptainer/singularity
# user says they want to use TAG of repository REPO
if ${LDMX_BASE}/ldmx_REPO_TAG.sif exists:
  use the specific file already downloaded
else if /cvmfs/unpacked.cern.ch/registry.hub.docker.com/ldmx/REPO:TAG exists:
  use the CVMFS unpacked image
else if ldmx/REPO:TAG exists on DockerHub:
  download and use local file
else
  error about non-existent tag

This would cleanly handle sites without CVMFS (or with CVMFS down for whatever reason) like it is being handled now, a local copy is downloaded, however with sites that have a CVMFS connection, it allows significant amount of disk space to be saved (and potentially quicker start up since no downloading would need to be performed if the ldmx unpacked image is already in CVMFS's cache from other users of the cluster).

bryngemark commented 9 months ago

Great, looks good to me.

Before we went container, and compiling outside slac was a mess, there were some thoughts in (CERN accustomed) Lund about using cvmfs for distributing our software. Fun to see it resurface, and in a way that seems straightforward.

I wonder if this could help with some problems we're seeing at Caltech where we suspect files are copied, not linked, from the LDCS-specific cache (seems to be an HTCondor thing), and the image itself uses almost all the memory on a node. If we could routinely push production images, this might be a different approach to how we set up building and linking to local copies of the image before we implemented the cache.