OpenGATE / Gate

Official public repository of Gate
http://www.opengatecollaboration.org
GNU Lesser General Public License v3.0
236 stars 263 forks source link

Attach actor to "world" volume causing summary to malfuntion. #639

Closed tontyoutoure closed 1 year ago

tontyoutoure commented 1 year ago

Describe the bug Attach actor to "world" causing summary to malfuntion.

  1. hits and singles are written twice in the summary file
  2. singles count is doubled

Desktop (please complete the following information):

Minimal example Pretty easy to replicate, just attach an actor to world.

Expected behavior that summary works fine

Additional context I will try to fix it and submit a PR.

tontyoutoure commented 1 year ago

after a little digging it seems to me that the problem is around GateActorManager.cc:304. The comment there said:

// Remove attached SD by replacing with a deactivated clone SD

However, it clearly did not remove the attached sd. Instead, every hit is processed twice, once by the original CrystalSD, once by the mutifuctional SD.

also, I checked the volume's sd before and after "replacing", volume->GetLogicalVolume()->GetSensitiveDetector() remains the same address.

edit: Hits are not processed twice. Just singles count are doubled in the summary.

tontyoutoure commented 1 year ago

At the time of PR #354 , this seems to be a smart solution. Now it has become the issue itself. Hopefully @cpommranz and @dsarrut can look into this. I'm happy to provide a barebone replication of this issue by then. Without your people's help, I will have to dig deeper to be comfortable enough to provide a fix.

tontyoutoure commented 1 year ago

Yes, there is also memory leak. The hero seems to turn into dragon now.

dsarrut commented 1 year ago

ouch ;) Thanks for your investigation! We should look at this. However, to be fair, we are committed to opengate version 10 right now (python version), and have very little bandwidth to dig into this., so it may take a bit of time.

tontyoutoure commented 1 year ago

@dsarrut Yes I suppose, looking forward to the python ver! I will see if I can fix it myself. However some (desynchronized) Q/A will be much appreciated.

dsarrut commented 1 year ago

Sur, I can try to help no problem!

tontyoutoure commented 1 year ago

Well, after a afternoon of digging, I confirmed that, when one tries to tie an actor to a volume that contains a CrystalSD, the following will happen:

  1. cloned SD removed original SD from the G4SDManager (not sure what it does), but the original SD is still there and tied to the logical volume. Later the original SD is put into GateMultiSensitiveDetector and count hits there.
  2. cloned SD creates a new GateSinglesDigitizer, which is added to GateDigitizerMgr. The summary output will iterate list of singles digitizer to get output list, thus two hit line&two single line for each crystal sd.
  3. two digitizer with same name will cause summary to malfunction and output 2 folds singles count.

Deleting the cloned SD will resolve the summary issue. However the memory leak continues, as long as an actor is tied to a CrystalSD. I think address this issue in the documentation and remove the cloned SD (it did not fix the mem leak anyway) could be a temporary fix.

tontyoutoure commented 1 year ago

@dsarrut ok my question is, how did Gate recollect the hit collection (G4HCofThisEvent) normally? It seems (through valgrind) that they are the primary cause of memory leak. Normally Geant4 will ask each SD to handle their own recollection through SD::EndOfEvent(HCE), but GateCrystalSD did not implement this method, I suppose you guys handle the memory recollect somewhere else.

tontyoutoure commented 1 year ago

well, the cloned SD do fix the memory leak of HCE, however it introduces some other memory leak, and the previous metiond summary bug:

  ==1659032== 23,986,136 (23,980,896 direct, 5,240 indirect) bytes in 999,204 blocks are definitely lost in loss record 22,380 of 22,380
==1659032==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1659032==    by 0x382E49: G4TDigiCollection<GateDigi>::G4TDigiCollection(G4String, G4String) (G4TDigiCollection.hh:151)
==1659032==    by 0x42F998: GateDigitizerInitializationModule::Digitize() (GateDigitizerInitializationModule.cc:54)
==1659032==    by 0x435C28: GateDigitizerMgr::RunDigitizers() (GateDigitizerMgr.cc:433)
==1659032==    by 0x388474: GateAnalysis::RecordEndOfEvent(G4Event const*) (GateAnalysis.cc:481)
==1659032==    by 0x5BC072: GateOutputMgr::RecordEndOfEvent(G4Event const*) (GateOutputMgr.cc:202)
==1659032==    by 0x3723B9: GateEventAction::EndOfEventAction(G4Event const*) (GateActions.cc:204)
==1659032==    by 0x59A21CC: G4EventManager::DoProcessing(G4Event*) (in /home/likun/Software/Geant4/install/lib/libG4event.so)
==1659032==    by 0x58FD756: G4RunManager::DoEventLoop(int, char const*, int) (in /home/likun/Software/Geant4/install/lib/libG4run.so)
==1659032==    by 0x58FB2E1: G4RunManager::BeamOn(int, char const*, int) (in /home/likun/Software/Geant4/install/lib/libG4run.so)
==1659032==    by 0x7714C5: GateApplicationMgr::StartDAQ() (GateApplicationMgr.cc:411)
==1659032==    by 0x776D9E: GateApplicationMgrMessenger::SetNewValue(G4UIcommand*, G4String) (GateApplicationMgrMessenger.cc:187)

BTW, valgrind log for memory leak without cloned SD is:

==1609725== 24,003,792 bytes in 1,000,158 blocks are definitely lost in loss record 22,276 of 22,277
==1609725==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1609725==    by 0x35CCD7: G4THitsCollection<GateHit>::G4THitsCollection(G4String, G4String) (G4THitsCollection.hh:152)
==1609725==    by 0x40839A: GateCrystalSD::Initialize(G4HCofThisEvent*) (GateCrystalSD.cc:108)
==1609725==    by 0x764AD6C: G4SDStructure::Initialize(G4HCofThisEvent*) (in /home/likun/Software/Geant4/install/lib/libG4digits_hits.so)
==1609725==    by 0x76490DE: G4SDManager::PrepareNewEvent() (in /home/likun/Software/Geant4/install/lib/libG4digits_hits.so)
==1609725==    by 0x59A1FAA: G4EventManager::DoProcessing(G4Event*) (in /home/likun/Software/Geant4/install/lib/libG4event.so)
==1609725==    by 0x58FD756: G4RunManager::DoEventLoop(int, char const*, int) (in /home/likun/Software/Geant4/install/lib/libG4run.so)
==1609725==    by 0x58FB2E1: G4RunManager::BeamOn(int, char const*, int) (in /home/likun/Software/Geant4/install/lib/libG4run.so)
==1609725==    by 0x771437: GateApplicationMgr::StartDAQ() (GateApplicationMgr.cc:411)
==1609725==    by 0x776D10: GateApplicationMgrMessenger::SetNewValue(G4UIcommand*, G4String) (GateApplicationMgrMessenger.cc:187)
==1609725==    by 0x7CC7159: G4UIcommand::DoIt(G4String) (in /home/likun/Software/Geant4/install/lib/libG4intercoms.so)
==1609725==    by 0x7CE5E4A: G4UImanager::ApplyCommand(char const*) (in /home/likun/Software/Geant4/install/lib/libG4intercoms.so)
==1609725== 
==1609725== 31,004,650 bytes in 1,000,150 blocks are definitely lost in loss record 22,277 of 22,277
==1609725==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1609725==    by 0x1A82929E: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1609725==    by 0x7653B1D: G4VHitsCollection::G4VHitsCollection(G4String, G4String) (in /home/likun/Software/Geant4/install/lib/libG4digits_hits.so)
==1609725==    by 0x765374D: G4HitsCollection::G4HitsCollection(G4String, G4String) (in /home/likun/Software/Geant4/install/lib/libG4digits_hits.so)
==1609725==    by 0x35CCA3: G4THitsCollection<GateHit>::G4THitsCollection(G4String, G4String) (G4THitsCollection.hh:150)
==1609725==    by 0x40839A: GateCrystalSD::Initialize(G4HCofThisEvent*) (GateCrystalSD.cc:108)
==1609725==    by 0x764AD6C: G4SDStructure::Initialize(G4HCofThisEvent*) (in /home/likun/Software/Geant4/install/lib/libG4digits_hits.so)
==1609725==    by 0x76490DE: G4SDManager::PrepareNewEvent() (in /home/likun/Software/Geant4/install/lib/libG4digits_hits.so)
==1609725==    by 0x59A1FAA: G4EventManager::DoProcessing(G4Event*) (in /home/likun/Software/Geant4/install/lib/libG4event.so)
==1609725==    by 0x58FD756: G4RunManager::DoEventLoop(int, char const*, int) (in /home/likun/Software/Geant4/install/lib/libG4run.so)
==1609725==    by 0x58FB2E1: G4RunManager::BeamOn(int, char const*, int) (in /home/likun/Software/Geant4/install/lib/libG4run.so)
==1609725==    by 0x771437: GateApplicationMgr::StartDAQ() (GateApplicationMgr.cc:411)
cpommranz commented 1 year ago

Thanks a lot for the report and the investigation @tontyoutoure! If I remember correctly, the cloned SD was only used to remove the original SD from the G4SDManager, before the original SD is added to the GateMultiSensitiveDetector, as you described in 1). Unfortunately, there is currently no better way to remove an SD from the SD manager, so this workaround is necessary, and it is the recommended way according to the Geant4 devs (see #354). The cloned SD is deactivated right away and should not be used further. In 2) you describe, that the cloned SD is used later, which might be part of the problem. I will have a closer look.

I had a quick look in the recent git history of the G4SDManager and didn't find anything that could explain the observed behavior, so I assume that the problem is indeed caused somewhere in the Gate code.

tontyoutoure commented 1 year ago

@cpommranz Gate recently add something called Digitizer Manager and I think that should be what introduces the problem. I wonder if it's possible to create a dummy SD class without digitizer bounding to it, and use it to replace the original SD.

Meanwhile of course, the new memory leak should also be addressed, since the cloned SD just messes up the summary.

tontyoutoure commented 1 year ago

The empty SD solution works like a charm, fix all the problems (summary, mem leak). Hope @dsarrut can allocate some time and merge it soon.

dsarrut commented 1 year ago

Great, thank you very much both of you! The only test that does not pass is test7 but this is expected, this test will be removed soon. Thanks for the effort!