JeffersonLab / HDGeant4

Geant4 simulation for the GlueX experiment
4 stars 4 forks source link

Crash when running on AlmaLinux9 #222

Closed aaust closed 1 week ago

aaust commented 1 month ago

The b1pi test currently fails on AlmaLinux9. There seems to be a problem getting the polarization value from CCDB. Here is the full stack trace:

===========================================================
#5  hddm_s::Polarization::getPx (this=<optimized out>) at /group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/halld_recon/halld_recon-4.49.0/Linux_Alma9-x86_64-gcc11.4.1/include/HDDM/hddm_s.hpp:9109
#6  GlueXPrimaryGenerator::GeneratePrimaryVertex (this=<optimized out>, event=0x7f5d9d178450) at src/GlueXPrimaryGenerator.cc:143
#7  0x00007f5dc18d0047 in GlueXPrimaryGeneratorAction::GeneratePrimariesHDDM (this=0x7f5da80ca4e0, anEvent=0x7f5d9d178450) at src/GlueXPrimaryGeneratorAction.cc:936
#8  0x00007f5dbfbc8220 in G4WorkerRunManager::GenerateEvent (this=0x7f5da8077a40, i_event=<optimized out>) at /u/group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/geant4/geant4.10.04.p02/source/run/src/G4WorkerRunManager.cc:379
#9  0x00007f5dbfbc58bd in G4WorkerRunManager::ProcessOneEvent (this=0x7f5da8077a40, i_event=<optimized out>) at /u/group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/geant4/geant4.10.04.p02/source/run/src/G4WorkerRunManager.cc:251
#10 0x00007f5dbfbc5b77 in G4WorkerRunManager::DoEventLoop (this=0x7f5da8077a40, n_event=100000, macroFile=0x0, n_select=-1) at /u/group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/geant4/geant4.10.04.p02/source/run/src/G4WorkerRunManager.cc:232
#11 0x00007f5dbfbbb64e in G4RunManager::BeamOn (this=0x7f5da8077a40, n_event=100000, macroFile=0x0, n_select=-1) at /u/group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/geant4/geant4.10.04.p02/source/run/src/G4RunManager.cc:273
#12 0x00007f5dbfbc7ab8 in G4WorkerRunManager::DoWork (this=<optimized out>) at /u/group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/geant4/geant4.10.04.p02/source/run/src/G4WorkerRunManager.cc:622
#13 0x00007f5dbfbcfc4c in G4MTRunManagerKernel::StartThread (context=<optimized out>) at /u/group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/geant4/geant4.10.04.p02/source/run/src/G4MTRunManagerKernel.cc:191
#14 0x00007f5db849f802 in start_thread () from /lib64/libc.so.6
#15 0x00007f5db843f450 in clone3 () from /lib64/libc.so.6

===========================================================

The same code runs on CentOS7, and this is the next printed line: G4WT0 > TAGGER: all parameters loaded from ccdb

It can be reproduced by copying the files from /volatile/halld/gluex/b1pi/2024-08-12/Linux_Alma9-x86_64-gcc11.4.1/11366/ and running hdgeant4 run.mac with the default environment in this directory:

source /group/halld/Software/build_scripts/gluex_env_boot_jlab.sh
gxenv
hdgeant4 run.mac
rjones30 commented 2 weeks ago

This was a long-standing bug in the input hddm event reader within the Glue primary generator class. It is weird that the code did not crash in the Centos7 build. I have checked in a fix to the master branch on HDGeant4. Please confirm that this fixes the problem.

aaust commented 2 weeks ago

Thanks for looking into that. Unfortunately, another problem was introduced by one of your commits yesterday which throws an error during compilation:

./src/G4TRandom.hh:30:23: error: conflicting return type specified for ‘virtual ULong64_t G4TRandom::Poisson(Double_t)’
*** [/u/group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/geant4/geant4.10.04.p02/share/Geant4-10.4.2/geant4make/config/common.gmk:77:
/volatile/halld/gluex/nightly/2024-09-25/Linux_Alma9-x86_64-gcc11.4.1/hdgeant4/tmp/Linux-g++/hdgeant4/GlueXBremsstrahlungGenerator.o] Error 1
rjones30 commented 2 weeks ago

Alex, yes that fix is needed to advance to Geant4.10.7. You can roll it back if you want to delay, but I propose that we combine the move to Alma9 with the advance to G4.10.7.

-Richard Jones

On Wed, Sep 25, 2024 at 7:53 AM Alexander Austregesilo < @.***> wrote:

Thanks for looking into that. Unfortunately, another problem was introduced by one of your commits yesterday which throws an error during compilation:

./src/G4TRandom.hh:30:23: error: conflicting return type specified for ‘virtual ULong64_t G4TRandom::Poisson(Double_t)’ *** [/u/group/halld/Software/builds/Linux_Alma9-x86_64-gcc11.4.1/geant4/geant4.10.04.p02/share/Geant4-10.4.2/geant4make/config/common.gmk:77: /volatile/halld/gluex/nightly/2024-09-25/Linux_Alma9-x86_64-gcc11.4.1/hdgeant4/tmp/Linux-g++/hdgeant4/GlueXBremsstrahlungGenerator.o] Error 1

— Reply to this email directly, view it on GitHub https://github.com/JeffersonLab/HDGeant4/issues/222#issuecomment-2373869168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3YKWAC3TUYD2XS2KAEEWLZYKP4RAVCNFSM6AAAAABMWWPRDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZTHA3DSMJWHA . You are receiving this because you commented.Message ID: @.***>

aaust commented 2 weeks ago

Does the geant4 upgrade not require a detailed comparison of the results? I think, we should discuss such a big step in a larger group.

rjones30 commented 2 weeks ago

Alex, I suppose, ok. I just checked in a fix that will select the correct version of G4TRandom based on the version of the G4 library being built against. -Richard

On Wed, Sep 25, 2024 at 9:10 AM Alexander Austregesilo < @.***> wrote:

Does the geant4 upgrade not require a detailed comparison of the results? I think, we should discuss such a big step in a larger group.

— Reply to this email directly, view it on GitHub https://github.com/JeffersonLab/HDGeant4/issues/222#issuecomment-2374046858, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3YKWDMUEN2CQUTYGPRJPLZYKY5FAVCNFSM6AAAAABMWWPRDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZUGA2DMOBVHA . You are receiving this because you commented.Message ID: @.***>

aaust commented 2 weeks ago

Thank you. It looks like this fixed the issue. I will wait for the successful nightly build and make a new release tomorrow.