Support non-10.2.3 custom geant4 builds out of the box

EinarElen commented 9 months ago

Building a non-10.2.3 version of Geant4 with the LDMX_CUSTOM_GEANT4 setup works but requires a lot of faff by the user. It boils down to the following

CMAKE_PREFIX_PATH prioritizes the Geant4 build in the container
The various environment variables pointing to the Geant4 datasets are pointing to the datasets in the container. Newer versions of Geant4 may require more up to date datasets.

Geant4 assumes that if the corresponding environment variable (e.g. G4NEUTRONHPDATA) is set then that version should be prioritized. If the environment variable is not set, it will search GEANT4_DATA_DIR for the versions that it is expecting.

There are also some other concerns when working with different Geant4 versions that I will be documenting in the docs page (although maybe part of this documentation belongs on the main documentation page? Or maybe the main documentation page could include the docs from here?).

The former is simple to deal with, just set CMAKE_PREFIX_PATH as part of the LDMX_CUSTOM_GEANT4 part of the entry script.

The latter can be dealt with a couple of ways.

We could unset the various environment variables that we defined in the container environment and rely on GEANT4_DATA_DIR to pick up the right thing
We could remove the environment variables outright and define GEANT4_DATA_DIR in the container and use the same method for both. Is there a reason for why we are explicitly marking out the G4 dataset versions and not the GEANT4_DATA_DIR @tomeichlersmith @bryngemark ?

EinarElen commented 9 months ago

I've tested the second approach now and it allows a 11.2.0 version to run without issue (output from ldmx-sw runheader, not sure why it is dated in the future though?)

    Geant4 revision =  Geant4 version Name: geant4-11-02-ref-00    (8-December-2023)

With custom G4 disabled, we don't get a Geant4 version? Is this a bug? This seems to happen for the default container as well...

    Geant4 revision =

tomeichlersmith commented 9 months ago

If the environment variable is not set, it will search GEANT4_DATA_DIR for the versions that it is expecting.

Is this check at run time? I itemized the data environment variables because I was just putting what the geant4.sh environment script did into the Dockerfile. I would like to avoid sourcing geant4.sh within the entry-point because then the environment provided by the container changes drastically if one were to enter the image without using the entry-point we provide (e.g. using denv or distrobox).

As for setting the Geant4 revision, we just copy in what Geant4's kernel says its version is.

https://github.com/LDMX-Software/SimCore/blob/20d9bcb6d2bad2b99255cf32c1b3f099b26752b0/src/SimCore/Simulator.cxx#L107-L110

I don't know why this wouldn't work, especially since I can see a version written into G4Version.hh of our tag of Geant4.

https://github.com/LDMX-Software/geant4/blob/LDMX.10.2.3_v0.5/source/global/management/include/G4Version.hh

EinarElen commented 9 months ago

Right, I see. Then I think the most straight-forward way to deal with it is to keep the current behavior and only do the fancy stuff when using the entrypoint

tomeichlersmith commented 9 months ago

To add more detail about my earlier comment...

I am torn about sourcing geant4.sh (and other environment scripts in general) in the entrypoint script. This is because there are two schools of thought about entrypoints floating around in the container-verse that I've read about.

Your container images should be highly specialized. They should only be used for an explicit purpose and that includes (and can be enforced by) the entrypoint you give it. In fact, the docker run command hasn't always had the ability to over-write the entrypoint shipped with the container image. Earlier versions of the image actually didn't use ENV in the Dockerfile at all and just sourced all the environment scripts in the entrypoint before running the user command.
Your container image is static and represents an environment. If I can't see what that environment looks like without running the image, then it is a poorly designed image. In other words, I should be able to understand exactly what environment is provided by reading the Dockerfile or using docker inspect. Reading about this made me want to start using ENV in our Dockerfile here so that I could use these other shiny tools (like distrobox) with our image.

These two schools are not always opposed but in this situation they are and so I'm uncertain on how to proceed. School (1) seems to favor sourcing env scripts in the entrypoint and telling all users that using the image without our entrypoint is "at your own risk" while School (2) seems to say our entrypoint should be lightened at all costs so users don't have to worry about it.

All this to say, right now, most (all?) users do use the entrypoint as written in this repository so sourcing geant4.sh in it is a reasonable thing to do.

bryngemark commented 9 months ago

My fuzzy view on the philosophy comment is that I think school 2 is what we should go for in production mode, for reproducibility. Now with the whole docker --> apptainer setup used for production, there is no real messing with that as far as I can see. If there are fancy/experimental uses needed for development where it would actually make sense to allow for environment changes, and that means abandoning the stricter setup, I think this is also fine. (Also, can most of this already be covered by runtime environment variables or do we need it at build time?) But as you say @tomeichlersmith, then that is at the developer's own risk :) I doubt that any non-advanced G4 user will ever run into this. And it is important that we are able to do any needed development in a reasonable way.

How about other dependencies, anything we need to worry about there if we were to extend this logic? (My thinking here being that while we happen to have at least one advanced G4 user in the sw dev core team, I wonder if other user problems might get hairier to debug.) If we do want to think about the consequences of any philosophy decision made here. We could also punt on that.

tomeichlersmith commented 9 months ago

Coming back to this (since I'd like to include this in the next container release), I think maintaining the current behavior and then unsetting all the variables and sourceing the custom install geant4.sh is the way to go. It allows us to still have a fixed environment to work off of for production while still enabling super-users to work with custom Geant4 if they wish.

(I still think this isn't a "clean" solution but I think that is the result of have a multi-purpose container. Perhaps put the link to this issue in a comment near the messy bash code for context.)

LDMX-Software / docker

Support non-10.2.3 custom geant4 builds out of the box #83