Open s6anloes opened 3 weeks ago
Hi @s6anloes ,
Can you reduce your example to something that still shows the scaling behaviour, but can be run in a much shorter time , and use not more than 1GB of memory for example?
Thanks, Andre
Hi Andre,
yes I can do this by reducing the number of towers placed. I have uploaded this to a branch called dd4hep_github_issue
. It should take just around 1.2GB now and be done in two minutes. I have updated the Reproducer to clone this branch only.
While doing this I have found something, that might be of interest:
When thinking about how to make a smaller scale example for you to test, there are essentially two ways to place just a small number of towers
There is one significant difference between the two scenarios: The towers within one stave (so towers for different theta) are slightly different from one another geometry-wise (except for a forward backward symmetry at eta=0 (theta=90 deg). The enveloping trapezoid shape is for sure different for each tower. However, the tubes and fibres within the various towers are a bit of a different story. I try to reuse the volumes for the tubes and fibres throughout the simulation, by creating them once and storing them in a map. If a tube of a given length needs to be placed, I first check if this volume already exists in the map and if so, place it in the tower. For instance, we expect at the centre of each tower the tubes to be all of same length across the towers because the tubes reach all the way from the back to the front face. Only on the sides (the wings of the tower) where we need to stagger to tubes to get the overall shape, we expect the lengths to differ from one tower to the next.
How is this important?
Let's first look at Scenario 2, placing a ring of towers. When using 1deg by 1deg towers, we are placing one single stave volume 360 times in different phi rotations. The volumes are all identical. This is the geometry I have now pushed to the new branch for you to test. And from the plots below, you can see that there is still a significant difference between running the simulation with sensitive volumes or without.
Running with sensitive volumes:
Running without sensitive volumes:
You can see from the blue line that after the geometry has been converted to Geant4 the memory rises much higher for the case with sensitive volumes (to a smaller scale now than with the full geometry).
In Scenario 1 (one full stave) things look different though. If you compare running with and without sensitive volumes, the memory consumption is the same.
Running with sensitve volumes:
Running without :
So what is different now? Well, probably that the towers are all different volumes. So the issue might be related to whether or not this volume already exists in the memory. But in this case it would still be surprising though to see no difference, since the volumes for the tubes and fibres are reused and therefore should already exist in the memory. Also it's wrong to say that all towers are different, because of the symmetry at theta=90deg. This should contribute by a factor of two, since the tower is created once and placed twice within the stave. Only difference is the position and rotation.
I don't really know what this means however, or if this is even relevant to the underlying issue. I just thought I should share this in case I'm onto something.
I think the issue is that there is an entry for each sensitive element with its unique path. https://github.com/AIDASoft/DD4hep/blob/3ccf9072b84f2e12dcec491d9078f13899bdd4f9/DDG4/src/Geant4VolumeManager.cpp#L180
Yes. I confirm this. There is an entry for each path to allow lookups using the touchable history. But: how else would you perform the lookup ? The only alternative is walking down the tree using strings. This has huge run-time hits.
I guess one shouldn't add a DetElement for each fiber, but depend on a segmentation to give a number to the fiber in the tower?
I'm not sure how this could work. Currently we need to mark the fibres as sensitive for signal generation. We have had some discussion with Sanghyun, whose geometry propagates optical photons, but it takes several minutes to simulate one event. Something we would like to avoid
This path reflects the path of volumes. Having a DetElement at each level makes things worse, but already the "unfolded" tree with all these little sensitive volumes makes a huge tree with vectors of volume IDs as lookup keys. It is well possible that one would have to somehow develop an alternative lookup mechanism for certain types of readouts.
In DD4hep such situations are meant to be handled by a relatively large sensitive volumes and then the little sensitive elements handled by a segmentation. Example: a wafer is a sensitive volume, the pixels on the wafer are not sensitive volumes, but handled by the segmentation.
In this case the envelope of fibers would be the sensitive volume and the individual fibres would then be handed by a segmentation. If such a adhoc approach is reasonable I cannot tell. Alternatively one tries to seek a model which describes such a setup efficiently.
I think I understand how this approach might work. Although I have one question. You say the envelope of the fibre would be the sensitive volume, I guess this means the mother volume. For our geometry, the sensible choice for sensitive volume would be three levels of hierarchy higher (the grand-grandmother volume), since each fibre core is placed within a claading volume within a tube volume. And then the tower would be the large sensitive volume which can be segmented. Would this approach still work? It is not clear to me how sensitive volumes treat daughter and grand-daughter volumes and even further. These volumes are no longer 'sensitive' in the sense that the sensitve detector action would not be called for steps in this daughter volume?
This is sort of the idea behind the segmentation concept. You would get the energy deposit in the grand-grandmother volume and compute the fiber from the location of the energy deposit within this volume.
If this works for fibers (which I guess are thin cylinders) I cannot tell, because there is some space between them filled probably with some glue. This then would not be handled correctly by Geant4, because the glue has different material characteristics than the fibers.
@s6anloes I tried to somewhat understand the code here: https://github.com/s6anloes/DDDRCaloTubes/blob/689347a36627b16471c012551fc3a7caa250bbe5/DRdetector/DRcalo/src/DRconstructor.cpp Depending on the granularity these are really a lot of volumes since apparently in theta things cannot be re-used, but must be recreated.
Nevertheless: Do you know where the memory really goes ?
Geant4VolumeManager
.How are the dd4hep2root
results to be understood ? Why does it need suddenly so much less memory?
So where does the memory really go?
I ran heaptrack to monitor allocations
PYTHONMALLOC=malloc heaptrack python `which ddsim` --compactFile ../DRdetector/DRcalo/compact/DDDRCaloTubes.xml -N1 -G --part.userParticleHandler=''
With and without setting the couple volumes as sensitive. And the line I link to above is the main difference between the two runs, as far as I can tell. This is a bit complicated because I have never used heaptrack before, and the recursion makes callstacks a little bit broader.
@MarkusFrankATcernch
Hmm, these are really good questions I wished I knew the answer to. I'm not really an expert on these things, so if you know any way I can figure this out, it would be greatly appreciated. The only thing I can tell you, is that the steady and linear increase in memory occurs the moment dd4hep prints the output : "successfully converted geometry to Geant4...". Since not nearly as much memory is used in the dd4hep2root command, my understanding was that it was probably the Geant4 geometry, and not the ROOT geometry.
@s6anloes
Well.... when it says "successfully converted geometry to Geant4..."
I think Geant4
is far from having finished its setup.
All the voxelization business and I do not know what other internal details are then still going on which may require
loads and loads of memory. There are certainly internal caches to speed up tracking etc. What cannot be avoided is the fact that there are 2 geometries in memory: the TGeo geometry and the Geant4 geometry.
All this will probably happen when the geometry gets closed just before the event simulation starts and probably is entirely independent of dd4hep
.
Now for the facts:
dd4hep2root
does not mean a lot. In your detector constructor you do not really use
DetElements
to build the structural hierarchy. Hence the overhead of dd4hep itself should be quite small: actually it is only the Volume extensions which are supplied to ROOT. This should be small. Though would be interesting what the difference of the dd4hep2root
geometry and the true dd4hep
geometry without Geant4
is.Geant4VolumeManager
.
If you only look at the memory usage with and without sensitive volumes in your plots above ie. having a populated or a not-populated Geant4VolumeManager
, the memory useage is about the same. Suggests to me the effect of the Geant4VolumeManager
is smallish.So where does the memory go?
One probably can only go through the main steps of setting up Geant4
with the debugger and see in the setup where the memory jumps....
@s6anloes , @andresailer I do not have /cvmfs/sw.hsf.org/ , but it should also run on any LCG view -- not?
Apparently the LCG views miss the Geant4 data tables:
#13 0x00007efdaad65299 in G4Exception (originOfException=originOfException
entry=0x7efdab1f8d2c "G4NuclideTable", exceptionCode=exceptionCode
entry=0x7efdab1f8d83 "PART70001", severity=severity
entry=FatalException, description=description
entry=0x7efdab1f8d66 "ENSDFSTATE.dat is not found.") at /build/jenkins/workspace/lcg_release_pipeline/build/projects/Geant4-11.2.1/src/Geant4/11.2.1/source/global/management/src/G4Exception.cc:115
#14 0x00007efdab189210 in G4NuclideTable::GenerateNuclide (this=this
Do I miss some environment or is LCG_106 incomplete?
source /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/setup.sh
ddsim --compactFile $DD4hepINSTALL/DDDetectors/compact/SiD.xml -N 2 -G
Works for me on lxplus.
Do you maybe also not have /cvmfs/geant4.cern.ch
?
G4ENSDFSTATEDATA=/cvmfs/geant4.cern.ch/share/data/G4ENSDFSTATE2.3
Yes this is the problem: /cvmfs/geant4.cern.ch
is missing.
I thought the idea of the LCG views is to have everything together in a compact form ?
It seems geant4 cvmfs is the hidden dependency. But you probably have those datafiles then on some LHCb CVMFS repo?
There are more problems. I tried to build on lxplus, but there I got a clash with python between system python and hsf python:
CMake Error at /cvmfs/sw.hsf.org/key4hep/releases/2024-03-10/x86_64-almalinux9-gcc11.3.1-opt/cmake/3.27.9-4qfmfr/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find Python: Found unsuitable version "3.10", but required is
exact version "3.10.13" (found /usr/include/python3.11, )
Call Stack (most recent call first):
/cvmfs/sw.hsf.org/key4hep/releases/2024-03-10/x86_64-almalinux9-gcc11.3.1-opt/cmake/3.27.9-4qfmfr/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:598 (_FPHSA_FAILURE_MESSAGE)
/cvmfs/sw.hsf.org/key4hep/releases/2024-03-10/x86_64-almalinux9-gcc11.3.1-opt/cmake/3.27.9-4qfmfr/share/cmake-3.27/Modules/FindPython/Support.cmake:3824 (find_package_handle_standard_args)
/cvmfs/sw.hsf.org/key4hep/releases/2024-03-10/x86_64-almalinux9-gcc11.3.1-opt/cmake/3.27.9-4qfmfr/share/cmake-3.27/Modules/FindPython.cmake:574 (include)
/cvmfs/sw.hsf.org/key4hep/releases/2024-03-10/x86_64-almalinux9-gcc11.3.1-opt/dd4hep/1.28-q6ea5f/cmake/DD4hepBuild.cmake:693 (FIND_PACKAGE)
/cvmfs/sw.hsf.org/key4hep/releases/2024-03-10/x86_64-almalinux9-gcc11.3.1-opt/dd4hep/1.28-q6ea5f/cmake/DD4hepConfig.cmake:62 (DD4HEP_SETUP_ROOT_TARGETS)
CMakeLists.txt:35 (find_package)
That is working for me:
echo "Sourcing environment dirs for lxplus9 [zsh|bash]" echo "Sourcing environment dirs for AlmaLinux 9.4"
export LIBGL_ALWAYS_INDIRECT=1
source /cvmfs/fcc.cern.ch/sw/latest/setup.sh
source /cvmfs/sft.cern.ch/lcg/views/LCG_105b/x86_64-el9-gcc13-opt/setup.sh export PATH=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/CMake/3.26.2/x86_64-el9-gcc13-opt/bin:$PATH export PATH=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/ninja/1.10.0/x86_64-el9-gcc13-opt/bin:$PATH
export CMAKE_PREFIX_PATH=/cvmfs/sft.cern.ch/lcg/releases/cfitsio/3.48-e4bb8/x86_64-el9-gcc13-dbg/:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/hepmc3/3.2.7/x86_64-el9-gcc13-opt/:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/xrootd/5.6.3/x86_64-el9-gcc13-opt/:$CMAKE_PREFIX_PATH
export Python_ROOT_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/Python/3.9.12/x86_64-el9-gcc13-opt/ export Boost_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/Boost/1.82.0/x86_64-el9-gcc13-opt/ export LCIO_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/LCIO/02.20/x86_64-el9-gcc13-opt/ export Qt5_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/qt5/5.15.9/x86_64-el9-gcc13-opt/ export TBB_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/tbb/2021.10.0/x86_64-el9-gcc13-opt/ export VDT_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/vdt/0.4.4/x86_64-el9-gcc13-opt/ export Vc_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/Vc/1.4.4/x86_64-el9-gcc13-opt/ export HEPMC3=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/hepmc3/3.2.7/x86_64-el9-gcc13-opt/ export PYTHIA8=/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/pythia8/310-2f242/x86_64-el9-gcc13-opt export PYTHIA8DATA=/cvmfs/sft.cern.ch/lcg/releases/MCGenerators/pythia8/310-2f242/x86_64-el9-gcc13-opt/share/Pythia8/xmldoc
export XercesC_LIBRARY=/cvmfs/sft.cern.ch/lcg/releases/XercesC/3.2.4-9e637/x86_64-el9-gcc13-opt/lib/libxerces-c.so export XercesC_INCLUDE_DIR=/cvmfs/sft.cern.ch/lcg/releases/XercesC/3.2.4-9e637/x86_64-el9-gcc13-opt//include/ export CLHEP_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/clhep/2.4.7.1/x86_64-el9-gcc13-opt/ export LCIO_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/LCIO/02.20/x86_64-el9-gcc13-opt/ export CMAKE_PREFIX_PATH=$Qt5_DIR:$VDT_DIR:$CMAKE_PREFIX_PATH
source /cvmfs/sft.cern.ch/lcg/releases/LCG_105b/ROOT/6.30.06/x86_64-el9-gcc13-opt/bin/thisroot.sh
export Geant4_DIR=/cvmfs/sft.cern.ch/lcg/releases/LCG_105b/Geant4/11.2.0/x86_64-el9-gcc13-opt/ export G4INSTALL=$Geant4_DIR source /cvmfs/sft.cern.ch/lcg/releases/LCG_105b/Geant4/11.2.0/x86_64-el9-gcc13-opt/share/Geant4/geant4make/geant4make.sh; cd -
Message ID: @.***>
Here are some results from simply using top
:
Invocation of TGeo
alone:
TGeo: geoPluginRun -input /scratch/online/frankm/SW/DDDRCaloTubes/install/share/compact/DDDRCaloTubes.xml -interactive -ui
PID PPID PR NI VIRT RES SHR S %CPU %MEM TIME+ USER P COMMAND
1472811 1469076 20 0 744700 568888 432088 T 0.0 0.1 0:20.13 frankm 58 geoPluginRun -input /scratch/online/frankm/SW/DDDRCaloTubes/install/share/compact/DDDRCaloTubes.+
Virt: 700 MB Resident: 569 MB
Tests involving Geant4
:
PID PPID PR NI VIRT RES SHR S %CPU %MEM TIME+ USER P COMMAND
Start of DetectorImp::init
1485953 1485861 20 0 1037628 760188 503960 t 0.0 0.1 0:10.87 frankm 47 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
End of DetectorImp::init
1485953 1485861 20 0 1037628 760188 503944 t 0.0 0.1 0:10.88 frankm 47 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
Start of dd4hep::DetectorImp::endDocument
1485953 1485861 20 0 1039112 761852 504348 t 0.0 0.1 0:10.92 frankm 47 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
End of dd4hep::DetectorImp::endDocument
1485953 1485861 20 0 1040084 763004 504348 t 85.0 0.1 0:26.60 frankm 47 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
Before Geant4Converter:
1485219 1485112 20 0 1097056 807788 530032 t 0.0 0.2 0:28.08 frankm 53 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
After Geant4Converter:
1485219 1485112 20 0 1099176 809900 530144 t 0.0 0.2 0:44.64 frankm 53 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
Before Geant4VolumeManager:
1485219 1485112 20 0 1099176 809900 530140 t 0.0 0.2 0:44.64 frankm 53 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
After Geant4VolumeManager:
1485219 1485112 20 0 1505076 1.2g 530136 t 0.0 0.2 1:45.48 frankm 53 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
After dd4hep::sim::Geant4Exec::initialize
1485219 1485112 20 0 1511056 1.2g 530240 t 0.0 0.2 1:46.10 frankm 53 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
Start of dd4hep::sim::Geant4Exec::run
1485219 1485112 20 0 1511928 1.2g 530428 t 0.0 0.2 1:46.11 frankm 53 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
After first event:
1485219 1485112 20 0 1520236 1.2g 532180 t 0.0 0.2 1:48.15 frankm 53 /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el9-gcc13-dbg/bin/python /cvmfs/sft.cern.ch/lcg/v+
Hence:
ROOT
uses about 700 MB virtual/570 MB resident memoryGeant4
uses on top about 400 MB virtual/240 MB residentGeant4
volume manager uses on top another 400 MB virtual/400 MB resident (total of 1.2 GB resident)
The rest can more or less be neglected.Hence a possible strategy would be:
Geant4VolumeManager
configurable and do not recurse into certain subdetectors.sensitive detector
construct, which does not use CellID
lookups using the Geant4VolumeManager
. The whole concept will not work for so many sensitive elements. Somehow a new CellID
lookup has to be thought of.This is all not implossible, but requires significant work and is not done in an afternoon. We can develop this as a common effort provided several persons work together.....
How do you know Geant4
is 240MB resident memory? The jump between calling TGeo
alone and after Geant4Converter
may be 240MB, but it is already close to this at the before Geant4Converter
stage, no?
How did you get this output? I would be interested to see how this scales with the full (or at least more complete) geometry.
But I guess it does track with what we have seen, that the main culprit is the Geant4VolumeManager
given the difference when running with and without sensitive detectors.
@MarkusFrankATcernch In this comment your last point confuses me. It is kind of the opposite of what I was trying to communicate, except for this one geometry, which is not the one you have been testing.
@s6anloes So what? This is the cost of loading G4. Loading all these libraries is far from free even if nothing is done with them (yet). The volume conversion in this case is apparently not very expensive.
Regarding @s6anloes description here: https://github.com/AIDASoft/DD4hep/issues/1285#issuecomment-2197471652 . Shall we first try to understand why Scenario 1 leads to no difference with/without sensitive volumes while Scenario 2 leads to significant differences with/without sensitive volumes? Can someone explain that to me?
Check duplicate issues.
Goal
I'm trying to understand the memory usage of sensitive volumes in dd4hep. I have a detector with a large number of sensitive volumes which seem to have a large impact on the memory consumption. More details are given below.
Operating System and Version
Centos 7
compiler
GCC 12.2.0
ROOT Version
6.28/10
DD4hep Version
1.28
Reproducer
To install the dual-readout calorimeter geometry: Note: the export command will need to be executed in each new shell
To run the simulation with all fibres marked as sensitive detector and monitor the memory usage via htop: Note: With the full geometr, this will take ~10GB of memory and about 10 minutes to build the geometry (at around 3 1/2 minutes you should be able to see the memory usage increase gradually)
Then to run the simulation without fibres being sensitive: In the DDDRCaloTibes.xml file in lines 333 and 335 change the "sensitive" value to
false
and run with the same command. This should take less than 1GB of memory. Note: the simulation will take slightly longer, because optical photons are propagated instead of killed like in the custom sensitive detector actionAdditional context
I have been trying to improve the memory consumption of the calorimeter for some time now and had meetings with and feedback from some of the experts. In a recent FCC Full Sim Working Group meeting I presented some studies I did on the memory consumption. Mainly I show the CPU and memory usage as function of time using the psrecord software. There you can also find one slide on the geometry and volume hierarchy of the detector.
The slides are mainly about trying different options to improve the memory consumption, but one important point was also the discrepancy in memory consumption between the ddsim and dd4hep2root commands. While running
ddsim
takes 10GB of memory,dd4hep2root
takes less than 1GB.In this meeting, a colleague working on the 'monolithic' version of the geometry, suggested to run the simulation without having any volume marked as sensitive. And indeed, this seems to be the cause for the high discrepancy. A colleague said that in Geant4 the sensitive volume is linked to the logical volume, so even if the volume is placed many times (as is the case for the fibres in my geometry), there is still just one sensitive volume. It looks like this is not the case in dd4hep, where it seems that the sensitive volume is tied to the placed volume.
I wonder if this is something that can be changed, as it causes a problem with geometries with many small sensitive volumes.
Sidenote: this issue is somewhat related to issue #1173, where Sarah Eno was also looking at the memory consumption of the dual-readout calorimeter. While I think there is still some optimisation possible for my geometry (I'm not using all possible symmetries at the moment), this doesn't seem to be the main problem here