easybuilders / easybuild-easyblocks

Collection of easyblocks that implement support for building and installing software with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
105 stars 284 forks source link

Wrong data folder for Geant4/10.04 #1352

Open lekshmideepu opened 6 years ago

lekshmideepu commented 6 years ago

Data folder path is not set properly for Geant4/10.04 as you could see below:

-bash-4.2$ ls ../..2017b/software/Geant4/10.04-GCC-5.4.0/share/Geant4-10.4.0/data/
G4ABLA3.1  G4EMLOW7.3  G4ENSDFSTATE2.2  G4NDL4.5  G4NEUTRONXS1.4  G4PII1.3  G4SAIDDATA1.1  PhotonEvaporation5.2  RadioactiveDecay5.2  RealSurface2.1
-bash-4.2$ ls ../..2017b/software/Geant4/10.04-GCC-5.4.0/share/Geant4-10.4/data/
ls: cannot access ../..2017b/software/Geant4/10.04-GCC-5.4.0/share/Geant4-10.4/data/: No such file or directory
-bash-4.2$ 

I could see that data folders are under Geant4-10.4.0 and not under Geant4-10.4. I had a kind of similar issue reported some time back and has been fixed.

I guess this needs to be modified.

reedts commented 6 years ago

Same here. Geant4 installs just fine but the share/data folder does not exist at all for me. Instead, the geant4.sh looks for those files in my home-directory (where they are not):

/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 50: cd: /home/j/j_bigg01/../lib64: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 65: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/G4NDL4.5: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 66: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/G4EMLOW7.3: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 67: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/PhotonEvaporation5.2: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 68: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/RadioactiveDecay5.2: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 69: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/G4NEUTRONXS1.4: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 70: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/G4PII1.3: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 71: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/RealSurface2.1: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 72: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/G4SAIDDATA1.1: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 73: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/G4ABLA3.1: No such file or directory
/Applic.HPC/software/MPI/intel/2017.4.196-GCC-6.4.0-2.28/impi/2017.3.196/Geant4/10.04/bin/geant4.sh: line 74: cd: /home/j/j_bigg01/../share/Geant4-10.4.0/data/G4ENSDFSTATE2.2: No such file or directory
kelseymh commented 4 years ago

I just got pointed to this issue, reading the discussion in EasyConfig PR #9656. As a Geant4 developer and user, I think I can contribute to both explaining what's going on, and hopefully suggesting an approach to use in EasyBuild to get a space-efficient and usable installation.

First, I want to make clear that these datasets are not "optional" in any way. They are an integral part of the Geant4 particle simulation toolkit, and are meant to be installed along with the software. They contain tabulated atomic energy levels (X-ray lines and intensities) for every isotope, they contain radioactive decay tables, nulcear gamma lines, evaluated neutron interaction cross-sections. All of the quantitative physics needed needed to model how particles interact with matter are in these datasets, and without them, the simulation simply can't function.

@reedts Having those dataset envvars all point to your home directory is a "bug" (or at least a stupid feature decision) in all of the Geant4 environment scripts (geant4.sh and geant4make.sh). Those scripts are generated as part of the CMake/make/make install procedure within Geant4, and in them then embed the path to the datasets which was specified in the CMake configuration. The scripts then check at runtime whether those paths exist, and when one doesn't exist, it falls back to using $HOME. I think this is dumb, and it should put in a human-readable "DIRECTORY_NOT_FOUND" sort of path. But I've never been responsible for that part of the code.

@lekshmideepu The directory path starting with "share/" is defined within the Geant4 CMake system. The path share/Geant4-10.4.0/... is exactly correct. Geant4 uses a three-position release number, where the last value is incremented only when they put out a "patch release"; currently we're at Geant4-10.4.3, for example.

A bigger issue for doing an EasyBuild installation of Geant4 is the question of "where should these datasets go?" As @boegel has pointed out, when you do a fresh, initial installation, they're relatively huge, about 3.7 GB for G4 10.6. However, not all of the datasets change from one release to the next, so the incremental increase is small (I have everything from G4 9.6 to now on my MacBook, spanning eight years, and it takes up 9.5 GB).

If you do a pure default Geant4 build, it puts all the datasets within the install directory under share/.... If you're on your laptop, and you're only doing just one single build, that's probably fine. But for anywhere that has to keep multiple builds going (even multiple architectures of the same release), this is a terrible approach. It means your build jobs are going to download all of those datasets every time, and you're going to have N copies of them on disk. About the only worse approach would be to have each individual user do their own personal download of all the datasets!

Geant4 provides, as part of its CMake options, a way to specify whatever directory you want to use for the dataset. If you specify -DGEANT4_INSTALL_DIR=<some-path>, it will download and install the datasets directly in that path. If they are already there, it skips the download process automatically.

With the existing EasyBuild system, installers can specify the desired path for their site by editing the .eb file, and adding configopts = "-DGEANT4_INSTALL_DIR=<some-local-path>". I don't know that .eb files are meant to be customized after check out, so I'm not a big fan of this approach.

If "we" (i.e., the EasyBuild developer community) decided on some natural place within the module paths, then we could provide that preset in a portable way, either within the .eb files, or even with the geant4.py EasyBlock. But it would also be nice if the latter could be overridden if needed by different sites.

One option, which I use on my personal machine, is to have a Geant4 "data" directory as a sibling to all my individual release builds (specifically, something like

/Applications/ GEANT4/ geant4.10.06.p01/ geant4.10.05.p01/ geant4.10.04.p03/ data/

When I do my build, I set -DGEANT4_INSTALL_DATADIR=/Applications/GEANT4/data. My preference would be set it as a relative path: $CMAKE_INSTALL_PREFIX/../data, but I don't think I can do that from the command line. However, that could be done in geant4.py. We could make this customizable for end-user .eb files by having them supply an extra configuration parameter, along the lines of

geant4data = 'my-path-to-the-datasets'

and having geant4.py look for a non-empty self.cfg['geant4data'] (I think that's right).

akesandgren commented 4 years ago

One obvious question about the data set. Does the newest data set work for older Geant4 versions? Judging from what you write above it does, but I'd like to make sure.

If it doesn't, could Geant4 be convinced to take a PATH like env var for where to search, so you could do: ...DATADIR=v10.03.p03:v10.02.xx:v09:xxx etc Then one could potentially only install upgraded data files to the newer version dirs, if the install procedure could be told how to do this...

If the newer data files do work for older G4 versions then I'd just make a fixed install in some site global data tree, not necessarily in the EB install tree.

kelseymh commented 4 years ago

One obvious question about the data set. Does the newest data set work for older Geant4 versions?

Sometimes they do, sometimes not. Each of the dozen or so datasets is indenpendent, and is used by different physics processes. Often, the change from one dataset version to another is simply adding new isotopes, or updating existing quantities, in which case the new version would work with older G4s. Other times (as happened recently with RadioactiveDecay) the dataset gets some entirely new content added, and older G4's will report an error because they don't recognize the new data (e.g., a code for a newly implemented decay channel).

Having said that, each G4 version is already shipped with the specific list of dataset versions it needs. That list of versions is what gets coded into the setup scripts. So you can have a single common directory for all of the files, which evolves over time. Each G4 version will find the specific dataset versions it needs, and ignore all the others.

akesandgren commented 4 years ago

Ok, so the installation creates per-version dirs in the datadir already? (Haven't got an installation to look at right now) That works for me. Need to look at what the installation really looks like someday...

kelseymh commented 4 years ago

Ok, so the installation creates per-version dirs in the datadir already?

No. The data directory is monolithic. Each dataset carries its own independent version, and the geant4.sh and related setup scripts set the envvars pointing to the twelve specific dataset+version directories needed for that release. Here's what my /Applications/GEANT4/data/ directory looks like on my MacBook:

{michaels-mbp:746} ls /Applications/GEANT4/data/
G4ABLA3.0/          G4SAIDDATA1.1/
G4ABLA3.1/          G4SAIDDATA2.0/
G4EMLOW6.32/            G4TENDL1.3.2/
G4EMLOW6.35/            LENDDATA/
G4EMLOW6.41/            PhotonEvaporation2.3/
G4EMLOW6.48/            PhotonEvaporation3.0/
G4EMLOW6.50/            PhotonEvaporation3.1/
G4EMLOW7.3/         PhotonEvaporation3.2/
G4EMLOW7.4/         PhotonEvaporation4.3.2/

and so on. Here's an example I created yesterday for just Geant4 10.5 and 10.6:

[kelsey@terra2 software]$ ls ~/CDMS/Geant4_DB/
G4ABLA3.1        G4PARTICLEXS1.1       PhotonEvaporation5.3-CDMS
G4data.sh        G4PARTICLEXS2.1       PhotonEvaporation5.5
G4EMLOW7.7       G4PII1.3              PhotonEvaporation5.5-CDMS
G4EMLOW7.9.1     G4SAIDDATA2.0         RadioactiveDecay5.3
G4ENSDFSTATE2.2  G4TENDL1.3.2          RadioactiveDecay5.3-CDMS
G4INCL1.0        get_datasets.sh       RadioactiveDecay5.4
G4NDL4.5         LENDDATA              RadioactiveDecay5.4-CDMS
G4NDL4.6         PhotonEvaporation5.3  RealSurface2.1.1

(the two shell scripts you see are mine, not Geant4's, and the names ending "-CDMS" are for my experiment, not part of the G4 distribution).

The idea is that you use the same directory, for every G4 build you do, regardless of toolchain or G4 version, and it populates incrementally when new releases come out.

akesandgren commented 4 years ago

Ok, so it will duplicate stuff that is already in G4EMLOW6.48/ when creating G4EMLOW6.50/? Just trying to make sure I understand what it is doing.

kelseymh commented 4 years ago

Yes, it will. It has to. Each of those directories starts its life as a simple tarball, which is unpacked verbatim. The tarball can't know what you already have installed. You can see how they suggest manual downloading at the Geant4 main site: http://geant4.web.cern.ch/support/download . The CMake build takes care of those downloads for you, but the files involved are identical.

kelseymh commented 4 years ago

Most of the individual dataset directories involved are tens of megabytes. The recent G4EMLOW* are around 350-ish MB, and RealSurface2.1.1 is 800 MB (!). That latter hasn't changed in a long time (1.0 came out some time before 2014, and 2.1.1 in 2018). G4TENDL1.3.2 is over 500 MB, but it's unchanged since 2017.

akesandgren commented 1 year ago

The latest Geant4/11.1.2 uses a separate Geant4-data package, which can be installed somewhere suitable for large data sets. That version should hopefully take care of any problems related to the data sets.