OpenGATE / Gate

Official public repository of Gate
http://www.opengatecollaboration.org
GNU Lesser General Public License v3.0
236 stars 263 forks source link

Memory leak in KillActor #257

Closed MrTzschr closed 5 years ago

MrTzschr commented 5 years ago

Hi,

the KillActor seems to have a memory leak. I wanted to use the actor to stop the tracking of the back-to-back photons in the detector (cylindricalPET) but at some point in time the simulations on a cluster stop due the high memory demand. In a very simple test using vGATE this seems to increase by 1-2 Mb per second.

Best, Moritz

djboersma commented 5 years ago

Hi @MrTzschr / Moritz, Thanks for the report. Could you add the macro of your very simple test here, so that we can try to reproduce your findings? Thanks!

MrTzschr commented 5 years ago

Thank your for response @djboersma. Here is the macro. I disabled the output but it does not matter if root is enabled or not.

===================================

/gate/geometry/setMaterialDatabase GateMaterials.db

/gate/world/geometry/setXLength 150. cm /gate/world/geometry/setYLength 150. cm /gate/world/geometry/setZLength 150. cm /gate/world/setMaterial Vacuum

/gate/world/daughters/name cylindricalPET /gate/world/daughters/insert cylinder /gate/cylindricalPET/setMaterial Vacuum /gate/cylindricalPET/geometry/setRmax 500. mm /gate/cylindricalPET/geometry/setRmin 360. mm /gate/cylindricalPET/geometry/setHeight 230. mm

/gate/cylindricalPET/daughters/name rsector /gate/cylindricalPET/daughters/insert box /gate/rsector/setMaterial Vacuum /gate/rsector/geometry/setXLength 36. mm /gate/rsector/geometry/setYLength 94.2704 mm /gate/rsector/geometry/setZLength 179.375 mm /gate/rsector/placement/setTranslation 469.7 0. 0. mm

/gate/rsector/daughters/name crystal /gate/rsector/daughters/insert box /gate/crystal/setMaterial LYSO /gate/crystal/geometry/setXLength 22. mm /gate/crystal/geometry/setYLength 4. mm /gate/crystal/geometry/setZLength 4. mm /gate/crystal/placement/setTranslation -0.7 0. 0. mm

/gate/crystal/repeaters/insert cubicArray /gate/crystal/cubicArray/setRepeatNumberX 1 /gate/crystal/cubicArray/setRepeatNumberY 23 /gate/crystal/cubicArray/setRepeatNumberZ 44 /gate/crystal/cubicArray/setRepeatVector 0. 4.0946 4.075 mm

/gate/rsector/repeaters/insert ring /gate/rsector/ring/setRepeatNumber 28

/gate/systems/cylindricalPET/rsector/attach rsector /gate/systems/cylindricalPET/crystal/attach crystal /gate/crystal/attachCrystalSD

/gate/physics/addPhysicsList emstandard_opt3

/gate/digitizer/Singles/insert adder /gate/digitizer/Singles/insert readout /gate/digitizer/Singles/readout/setDepth 1 /gate/digitizer/Singles/insert thresholder /gate/digitizer/Singles/thresholder/setThreshold 350. keV /gate/digitizer/Singles/insert upholder /gate/digitizer/Singles/upholder/setUphold 650. keV /gate/digitizer/Coincidences/setWindow 4. ns

KillActor to end tracks at entering the crystals.

/gate/actor/addActor KillActor KillBill /gate/actor/KillBill/save KilledTracks.txt /gate/actor/KillBill/attachTo crystal

/gate/run/initialize

/gate/source/addSource source gps /gate/source/source/gps/centre 20.7107 20.7107 0.0 mm /gate/source/source/gps/type Point /gate/source/source/setActivity 1 Bq /gate/source/source/setType backtoback /gate/source/source/gps/particle gamma /gate/source/source/gps/ene/mono 511. keV /gate/source/source/gps/angtype iso

/gate/output/allowNoOutput

/gate/output/root/enable

/gate/output/root/setFileName result

/gate/output/root/setRootHitFlag 0

/gate/output/root/setRootSinglesFlag 0

/gate/output/root/setRootCoincidencesFlag 1

/gate/random/setEngineName MersenneTwister /gate/random/setEngineSeed auto

/gate/application/setTimeSlice 100000000 s /gate/application/setTimeStart 0 s /gate/application/setTimeStop 100000000 s

/gate/application/startDAQ

djboersma commented 5 years ago

Pooh hey. On my laptop the memory leak performed as 15-20 Mb/sec, or about 1GiB/minute. :)

MrTzschr commented 5 years ago

Bug confirmed, great! The code of the KillActor is quite compact, so i did not find any obvious issue and suspect it's an internal issue on how GATE/Geant4 processes hits and calls the actors method!? As far as i understand the method GateKillActor::ProcessHits(...) is called whenever a step into the crystals (due to /gate/actor/KillBill/attachTo crystal) happens. Is the actor or some involved object re-created when this happens? Links for easier access: https://github.com/OpenGATE/Gate/blob/develop/source/digits_hits/include/GateKillActor.hh https://github.com/OpenGATE/Gate/blob/develop/source/digits_hits/src/GateKillActor.cc

MrTzschr commented 5 years ago

Hello again,

I debugged the problem using valgrind, valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all Gate main.mac and found some issues but don't know how to proceed. For a simple simulation using a point source, no attenuation phantom, no root output and 100 primaries I got 67 errors related to the KillActor. The first 66 look similar, the last error is different. Here are the two last errors:

==16213== 376 bytes in 1 blocks are indirectly lost in loss record 8,740 of 10,027
==16213==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16213==    by 0x7D273E: GateKillActor::make_sensor(G4String, int) (in /usr/share/gate/Gate-install/bin/Gate)
==16213==    by 0x678F3B: GateActorManager::AddActor(G4String, G4String, int) (in /usr/share/gate/Gate-install/bin/Gate)
==16213==    by 0x67B7CC: GateActorManagerMessenger::SetNewValue(G4UIcommand*, G4String) (in /usr/share/gate/Gate-install/bin/Gate)
==16213==    by 0xF21418A: G4UIcommand::DoIt(G4String) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF23427E: G4UImanager::ApplyCommand(char const*) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF2006A6: G4UIbatch::ExecCommand(G4String const&) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF20253D: G4UIbatch::SessionStart() (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF234E91: G4UImanager::ExecuteMacroFile(char const*) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF22112E: G4UIcontrolMessenger::SetNewValue(G4UIcommand*, G4String) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF21418A: G4UIcommand::DoIt(G4String) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF23427E: G4UImanager::ApplyCommand(char const*) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)

==16213== 7,532 (352 direct, 7,180 indirect) bytes in 1 blocks are definitely lost in loss record 9,611 of 10,027
==16213==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16213==    by 0x67D88C: GateActorMessenger::BuildCommands(G4String) (in /usr/share/gate/Gate-install/bin/Gate)
==16213==    by 0x67F272: GateActorMessenger::GateActorMessenger(GateVActor*) (in /usr/share/gate/Gate-install/bin/Gate)
==16213==    by 0x7D27E7: GateKillActor::make_sensor(G4String, int) (in /usr/share/gate/Gate-install/bin/Gate)
==16213==    by 0x678F3B: GateActorManager::AddActor(G4String, G4String, int) (in /usr/share/gate/Gate-install/bin/Gate)
==16213==    by 0x67B7CC: GateActorManagerMessenger::SetNewValue(G4UIcommand*, G4String) (in /usr/share/gate/Gate-install/bin/Gate)
==16213==    by 0xF21418A: G4UIcommand::DoIt(G4String) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF23427E: G4UImanager::ApplyCommand(char const*) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF2006A6: G4UIbatch::ExecCommand(G4String const&) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF20253D: G4UIbatch::SessionStart() (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF234E91: G4UImanager::ExecuteMacroFile(char const*) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)
==16213==    by 0xF22112E: G4UIcontrolMessenger::SetNewValue(G4UIcommand*, G4String) (in /usr/share/geant4/geant4-install/lib/libG4intercoms.so)

I can upload the full log file but it's rather large. Any help is appreciated.

EDIT: In total 23 tracks got killed by the actor. Using my simple setup, it looks like that GatePhaseSpaceActor and GateProductionAndStoppingActor suffer the same problems. Whereas GateProductionActor does not. The most prominent difference to me is that all actors with the issue access the G4Step by calling the GetTrack() routine. The GateProductionActor is not using this information. Does this help?

dsarrut commented 5 years ago

Hello, thanks a lot for this information. On my side, I cannot investigate this right now. However, we will try to investigate in the following weeks. In the meantime, any progress is appreciated! Thanks, David

cpommranz commented 5 years ago

PR #266 fixes the issue for me. Could you please cross check @MrTzschr ? It would be nicer to remove the SD from the G4SDManager, but I couldn't find a suitable way right now.

Valgrind indeed shows a lot of memory leaks. Considering C++11 smart pointers would be a good idea IMHO.

dsarrut commented 5 years ago

@MrTzschr could you please check if the PR #266 fix the leak issue?

@cpommranz for sure smart pointer would be a very good idea, however, it is a huge amount of work. What can be done at least, is using smart pointers (and standard data structure like std::vector) in all new developments.

djboersma commented 5 years ago

@cpommranz great, thanks! @dsarrut I agree that it's a lot of work that we cannot implement on the short term. If we would have more manpower, then we would probably try to rewrite Gate from scratch, or at least thoroughly refactor it. :) But I agree with @cpommranz that we should try to fix this in the medium/long term. We do not need to eliminate all bold pointers all at once, we can introduce the smart pointers gradually.

MrTzschr commented 5 years ago

Sorry for the late reply. I tested a few things and currently it looks promising. Will report back if something changes. @cpommranz Thank you so much for the fix!