Closed markito3 closed 2 years ago
I see the same error in a CentOS 8 Singularity container and on jlabl5, a RHEL 8 node. Both run GCC 8.4.1. I see it on both version sets 4.44.0 and 4.45.1.
I do not see the error with 4.45.1 on the ifarm (CentOS 7). My suspicion is that it has something to do with the version of GCC we use, i.e., 8.4.1 breaks something in the code or in the build procedure.
I have managed to reproduce the problem on my Centos7 server at UConn, by doing the following:
Change anything in the above recipe and it no longer crashes:
Is this an artifact of using G4MULTITHREADED with an old version of G4.10.02p02? Maybe.
Changing nothing in the code, but simply switching from MT (multi-threaded build of G4 libs) to non-MT (without the MT code features) in G4.10.02 makes the problem disappear. Meanwhile, the problem with the MT build goes away when you move forward from G4.10.02 to G4.10.04p02 or G4.10.06p01.
I propose that this is a defect in the MT functionality in the G4.10.02p02 release of the G4 library. Remember that MT functionality was new in G4.10, and early releases of G4.10 still had bugs in the MT code. Can we just leave G4.10.02 behind?
fine with me :-)
Actually, the disappearance of the crash in my build may just be due to the fact that the G4 developers moved G4LogicalVolume::AddDaughter from inline G4Logical.icc source file to the G4Logical.cc implementation file. This has the side effect of hiding the issue if it is related to the g++ compiler version because it means the AddDaughter in G4.10.04 and following was compiled using the pre-g++8.4 compiler in my specific test environment. To know for sure, I need to run my test against a build of G4 that was compiled with the post-g++8.4 compiler, as well as the HDGeant4 code.
Initial tests with G4.10.4.p02 look good. Thanks to Mark Ito for the build_scripts help. Waiting for Hao to confirm.
Tried the b1pi test with hdgeant4 with G4 10.04 on Fedora 34. No problems. This is the first time I have seen it work on Fedora versions greater than 30. Looks like leaving G4 10.02 in the rear-view might be the answer here. For reference here is the error, on Fedora 33, back in March.
@lihaoahil for when you get a moment.
Sorry guys I forgot to close this issue here. All good now.
I first heard of this from @lihaoahil via @nsjarvis on September 3:
Hi Mark,
Hao found that hdgeant4 does not run on CentOS 8. He used the two version sets available 4.42.1 and 4.45.1.
I tried out your test files in u/scratch/marki/hdg4t using version set 4.45.1 on jlabl5 and saw an error similar to Hao's. His message is forwarded below. He mentioned albert and red queue first; those are running RHEL7.
Naomi.
---------- Forwarded message --------- From: Hao Li hl2@andrew.cmu.edu Date: Fri, Sep 3, 2021 at 5:30 AM Subject: hdgeant4 crashed on ernest To: Naomi Jarvis nsj@cmu.edu
Hi Naomi, So firstly 4.45.1 works perfectly fine on albert and red queue.
Then I tested hdgeant4 with both version set 4.42.1 and 4.45.1 on ernest's interactive node and found it crashed. I have a folder at ~haoli/ernest with the env file and input&config for hdgeant4 (control.in, input.hddm and run.mac) for you in case you'd like to reproduce the crash.
Cheers, Hao
BTW, if it makes any sense to you, the crash message look something like this:
Geant4 version Name: geant4-10-02-patch-02 [MT] (17-June-2016) << in Multi-threaded mode >> Copyright : Geant4 Collaboration Reference : NIM A 506 (2003), 250-303 WWW : http://cern.ch/geant4
JANA >>Created JCalibration object of type: JCalibrationCCDB JANA >>Generated via: JCalibration using CCDB for MySQL and SQLite databases JANA >>Run:30274 JANA >>URL: mysql://ccdb_user@hallddb.jlab.org/ccdb JANA >>context: variation=mc JANA >>comment: Variation for simulations with data conditions
There was a crash. This is the entire stack trace of all threads:
0 0x00007fbccdbf6aab in waitpid () from /lib64/libc.so.6
1 0x00007fbccdb724af in do_system () from /lib64/libc.so.6
2 0x00007fbcd6dd8af7 in TUnixSystem::Exec (shellcmd=, this=0x238eef0) at /home/gluex2/gluex_top8/root/root-6.08.06/core/unix/src/TUnixSystem.cxx:2118
3 TUnixSystem::StackTrace (this=0x238eef0) at /home/gluex2/gluex_top8/root/root-6.08.06/core/unix/src/TUnixSystem.cxx:2405
4 0x00007fbcd6ddab24 in TUnixSystem::DispatchSignals (this=0x238eef0, sig=kSigSegmentationViolation) at /home/gluex2/gluex_top8/root/root-6.08.06/core/unix/src/TUnixSystem.cxx:3625
5
6 0x0000000000000000 in ?? ()
7 0x00007fbce0c6ff82 in G4LogicalVolume::AddDaughter (this=0x28c25f0, pNewDaughter=) at /home/gluex2/gluex_top8/geant4/geant4.10.02.p02/include/Geant4/G4LogicalVolume.icc:165
8 0x00007fbcd9ea953f in G4PVPlacement::G4PVPlacement (this=0x29065e0, pRot=, tlate=..., pCurrentLogical=0x281fd20, pName=..., pMotherLogical=0x28c25f0, pMany=false, pCopyNo=1, pSurfChk=false) at /home/gluex2/gluex_top8/geant4/geant4.10.02.p02/source/geometry/volumes/src/G4PVPlacement.cc:117
9 0x00007fbce0ddf7e3 in HddsG4Builder::createVolume (this=0x26d1340, el=, ref=...) at /home/gluex2/gluex_top8/geant4/geant4.10.02.p02/include/Geant4/G4LogicalVolume.icc:61
10 0x00007fbcd29867ce in CodeWriter::createVolume (this=this
entry=0x26d1340, el=el entry=0x32032e8, ref=...) at /home/gluex2/gluex_top8/hdds/hdds-4.14.0/hddsCommon.cpp:1337
11 0x00007fbce0ddf45f in HddsG4Builder::createVolume (this=0x26d1340, el=0x32032e8, ref=...) at src/HddsG4Builder.cc:562
12 0x00007fbcd2986dca in CodeWriter::createVolume (this=this
entry=0x26d1340, el=el entry=0x3201788, ref=...) at /home/gluex2/gluex_top8/hdds/hdds-4.14.0/hddsCommon.cpp:1431
13 0x00007fbce0ddf45f in HddsG4Builder::createVolume (this=0x26d1340, el=0x3201788, ref=...) at src/HddsG4Builder.cc:562
14 0x00007fbcd2986dca in CodeWriter::createVolume (this=this
entry=0x26d1340, el=el entry=0x35fc758, ref=...) at /home/gluex2/gluex_top8/hdds/hdds-4.14.0/hddsCommon.cpp:1431
15 0x00007fbce0ddf45f in HddsG4Builder::createVolume (this=0x26d1340, el=0x35fc758, ref=...) at src/HddsG4Builder.cc:562
16 0x00007fbcd29846c4 in CodeWriter::translate (this=this
entry=0x26d1340, topel=topel entry=0x35fc758) at /home/gluex2/gluex_top8/hdds/hdds-4.14.0/hddsCommon.cpp:2228
17 0x00007fbce0dd62fa in HddsG4Builder::translate (this=this
entry=0x26d1340, topel=topel entry=0x35fc758) at src/HddsG4Builder.cc:1401
18 0x00007fbce0c6ab84 in GlueXDetectorConstruction::GlueXDetectorConstruction (this=0x26d1300, hddsFile=...) at src/GlueXDetectorConstruction.cc:179
19 0x00007fbce015cabf in main (argc=1, argv=0x7ffcff608718) at /usr/include/c++/8/bits/allocator.h:139
20 0x00007fbccdb51493 in __libc_start_main () from /lib64/libc.so.6
21 0x000000000073bf1e in _start ()
The lines below might hint at the cause of the crash. You may get help by asking at the ROOT forum http://root.cern.ch/forum. Only if you are really convinced it is a bug in ROOT then please submit a report at http://root.cern.ch/bugs. Please post the ENTIRE stack trace from above as an attachment in addition to anything else that might help us fixing this issue.
6 0x0000000000000000 in ?? ()
7 0x00007fbce0c6ff82 in G4LogicalVolume::AddDaughter (this=0x28c25f0, pNewDaughter=) at /home/gluex2/gluex_top8/geant4/geant4.10.02.p02/include/Geant4/G4LogicalVolume.icc:165
8 0x00007fbcd9ea953f in G4PVPlacement::G4PVPlacement (this=0x29065e0, pRot=, tlate=..., pCurrentLogical=0x281fd20, pName=..., pMotherLogical=0x28c25f0, pMany=false, pCopyNo=1, pSurfChk=false) at /home/gluex2/gluex_top8/geant4/geant4.10.02.p02/source/geometry/volumes/src/G4PVPlacement.cc:117
9 0x00007fbce0ddf7e3 in HddsG4Builder::createVolume (this=0x26d1340, el=, ref=...) at /home/gluex2/gluex_top8/geant4/geant4.10.02.p02/include/Geant4/G4LogicalVolume.icc:61
10 0x00007fbcd29867ce in CodeWriter::createVolume (this=this
entry=0x26d1340, el=el entry=0x32032e8, ref=...) at /home/gluex2/gluex_top8/hdds/hdds-4.14.0/hddsCommon.cpp:1337
11 0x00007fbce0ddf45f in HddsG4Builder::createVolume (this=0x26d1340, el=0x32032e8, ref=...) at src/HddsG4Builder.cc:562
12 0x00007fbcd2986dca in CodeWriter::createVolume (this=this
entry=0x26d1340, el=el entry=0x3201788, ref=...) at /home/gluex2/gluex_top8/hdds/hdds-4.14.0/hddsCommon.cpp:1431
13 0x00007fbce0ddf45f in HddsG4Builder::createVolume (this=0x26d1340, el=0x3201788, ref=...) at src/HddsG4Builder.cc:562
14 0x00007fbcd2986dca in CodeWriter::createVolume (this=this
entry=0x26d1340, el=el entry=0x35fc758, ref=...) at /home/gluex2/gluex_top8/hdds/hdds-4.14.0/hddsCommon.cpp:1431
15 0x00007fbce0ddf45f in HddsG4Builder::createVolume (this=0x26d1340, el=0x35fc758, ref=...) at src/HddsG4Builder.cc:562
16 0x00007fbcd29846c4 in CodeWriter::translate (this=this
entry=0x26d1340, topel=topel entry=0x35fc758) at /home/gluex2/gluex_top8/hdds/hdds-4.14.0/hddsCommon.cpp:2228
17 0x00007fbce0dd62fa in HddsG4Builder::translate (this=this
entry=0x26d1340, topel=topel entry=0x35fc758) at src/HddsG4Builder.cc:1401
18 0x00007fbce0c6ab84 in GlueXDetectorConstruction::GlueXDetectorConstruction (this=0x26d1300, hddsFile=...) at src/GlueXDetectorConstruction.cc:179
19 0x00007fbce015cabf in main (argc=1, argv=0x7ffcff608718) at /usr/include/c++/8/bits/allocator.h:139
20 0x00007fbccdb51493 in __libc_start_main () from /lib64/libc.so.6
21 0x000000000073bf1e in _start ()
27.338u 2.113s 0:40.09 73.4% 0+0k 0+40io 0pf+0w