Closed andiwand closed 1 year ago
Hm, could be the two interfere?
Is this executed in single thread ?
No good numThreads=-1
- this would need Geant4MT.
For me this crashes with one thread as well
Is that with the same error / segfault ?
Hmm it looks a bit different to be honest, but couldn't check in detail for no. I attache my gdb backtrace, maybe this gives a hint.
@andiwand could you maybe also run it in gdb to see if it is the same fault (its not entirely clear to me from the error message)
Hm, could be the two interfere?
I think so yes. Pythia8 only works and Geant4 only works.
@andiwand could you maybe also run it in gdb to see if it is the same fault (its not entirely clear to me from the error message)
sure will do
I can confirm I just encountered the same issue using the geant4.py example :
### CAUGHT SIGNAL: 11 ### address: 0x7f9417827000, signal = SIGSEGV, value = 11, description = segmentation violation. Address not mapped to object.
I tried updating to the latest G4 version (11.0.3) but it didn't change anything
As we discussed during today's meeting I tried to replace the ODD by the GDLM implementation of Alice_v3 and it ran through with just a few warning.
So the issue is either with the ODD itself or the DDG4DetectorConstruction
...
This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.
I have had an other look at this and I just notice something. If I start removing the support from the ODD xml the segfault happen much later so maybe there is something bad with the support surface definition ?
This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.
I was checking back this issue out of curiosity and it is still there. Maybe we should try to investigate this again at some point ?
For sure this is something that we need to fix. Do we have a script to reproduce this?
Okay, I have investigated this a bit and some new infos:
First of all, I enabled some logging facilities in Geant4, which gave me the result that this is caused by photons quite far away from the center in z direction (z is around 1e4):
This is reproducible in pythia also with different seeds. Then I also could reproduce the crash with the ParticleGun
:
addParticleGun(
s,
MomentumConfig(0.1 * u.GeV, 2.0 * u.GeV, transverse=True),
EtaConfig(-4.0, 4.0, uniform=True),
ParticleConfig(2, acts.PdgParticle.eGamma),
vtxGen=acts.examples.GaussianVertexGenerator(
stddev=acts.Vector4(10 * u.mm, 10 * u.mm, 10 * u.mm, 0.0 * u.ns),
mean=acts.Vector4(18, 3.78, 1.09e4, 0),
),
multiplicity=100,
rnd=rnd,
)
I'm not totally sure what to do with these information, but maybe someone has an idea :)
So it's G4 breaking in a specific region of the detector?
Wait the energy goes to 0 in the second step. Could it be that G4 doesn't handle photon stopping in some volumes ?
Wait the energy goes to 0 in the second step. Could it be that G4 doesn't handle photon stopping in some volumes ?
No I think with the electron in the pixel endcap is everythin fine, the photon below is the problem. There it only loggs the 0th step and then segfaults.
I could imagine that a problem is that it starts already outside of the detector (in the world_volume_1
)?
Could it be that the world volume is to small or something like that?
Oh yeah I was looking at the wrong line... But you are right, the world volume size is 10m along z so this photon is outside the DD4Hep detector.
Unfortunately, I don't think this is the only issue :( I tried to edit the particle selector to remove all particle with x, y or z larger than 5m (in abs) and it still crashes with ttbar. How did you get those extra log Benjamin ?
Allready merged: https://github.com/acts-project/acts/pull/1790
With a new build from main branch you should be able to enable it via setting the logLevel
to VERBOSE
in the addGeant4
function.
Oh perfect I will have a look next week in more detail then !
Unfortunately, I don't think this is the only issue :( I tried to edit the particle selector to remove all particle with x, y or z larger than 5m (in abs) and it still crashes with ttbar. How did you get those extra log Benjamin ?
Actually I was able to run one event in the pythia8+geant4+ODD combination without segfault by increasing the world volumen manually from 10m
to 100m
in the ODD xml files...
I'm not sure if something like that would be a reasonable fix? Has this any other implications @asalzburger ?
I will try to run more events now, however, they take quite a long time (around 30 minutes per event)
I will try to run more events now, however, they take quite a long time (around 30 minutes per event)
Okay, actually it does not resolve the issue, I still get the segfault in a later event. Maybe it has just changed the random numbers a bit so that 1 event went through.
A bit unrelated but there is a bug in 'addGeant4' in 'simulation.py'. Line 597 it uses particles_input
for the G4 input (instead of particles_selected
) ignoring the particle selector. I can open a quick MR to fix this
A bit unrelated but there is a bug in 'addGeant4' in 'simulation.py'. Line 597 it uses
particles_input
for the G4 input (instead ofparticles_selected
) ignoring the particle selector. I can open a quick MR to fix this
If someone wants to have a look : https://github.com/acts-project/acts/pull/1792
With this you can cut the particle outside the detector by adding preselectParticles = ParticleSelectorConfig(eta=(-3.0, 3.0),absZ=(0, 1e4), pt=(150 * u.MeV, None), removeNeutral=True),
to the addGeant4
. Doesn't solve the segfault in the ttbar case (but solve the photon issue).
Actually the code seem to be running on my side and doesn't segfault anymore... Can someone else confirm ?
@Corentin-Allaire are you using the chain from above? otherwise if you could share the script I can try to verify
@andiwand here is the chain I use : full_chain_odd.txt
this is still segfaulting for me on 7a3761d2b3f35c802bc03622d3b55fc9d463e426
I might have changed something else by accident let me check (maybe you can also run it with verbose log of G4 ?)
Actually, for me it worked now at least for two events without segfault (I only applied the z-selection, not the pt or eta ones). Thats quite good news.
However, I got the following interesting warning:
Maybe I should add I had a timestamp-based seed, not the usual 42.
did you run geant4 in verbose mode @benjaminhuth ? somehow it runs now for a couple of minutes without crashing
nope, at least that last one not. But I think we must consider the verbose mode to be extreeemly slow due to this huge printouts...
hm for me it is still crashing even with 2 events
this is my geant version:
**************************************************************
Geant4 version Name: geant4-11-00-patch-01 [MT] (8-March-2022)
Copyright : Geant4 Collaboration
References : NIM A 506 (2003), 250-303
: IEEE-TNS 53 (2006), 270-278
: NIM A 835 (2016), 186-225
WWW : http://geant4.org/
**************************************************************
but feel free to close the ticket since it works for both of you now
@andiwand did you keep the number of thread to 1 ?
I realised I modified the preselection to cut on x,y and z so I removed that change but it still works on 5 events for now. I do have a slightly more recent G4 version :
Geant4 version Name: geant4-11-00-patch-03 [MT] (16-September-2022)
Copyright : Geant4 Collaboration
References : NIM A 506 (2003), 250-303
: IEEE-TNS 53 (2006), 270-278
: NIM A 835 (2016), 186-225
WWW : http://geant4.org/
**************************************************************```
yeah exactly I did't modify the script. just executed the one you sent here https://github.com/acts-project/acts/issues/1578#issuecomment-1401645240
So I have geant4-11-00-patch-03
as well, and for me it also works now with 5 events.
Maybe its indeed the geant version?
let me update and check again
I remember reading there was some issue with patch 01 witch is why I updated the first time I run into the odd segfault
Let me also open and MR to solve to let people use G4 with the Odd
Actually I think this is a good workaround, though not 100% satisfying...
I just wonder if the change in the full_chain_odd.py
is enough to kind of document this for others?
In my opinion this is a pythia issue and not an acts one
I have opened a PR : https://github.com/acts-project/acts/pull/1794. I am mentioning this issue in the comment of the code if people want to understand the full story
I recompiled G4 with 11.1.0
and it seems not to crash anymore
I am in the process of running on 100 events just to check that we don't get particles with X or Y > 10m
I had the simulation run for 100 ttbar events and no issue occurred. I think we can close this one then ! As a summary in case someone comes back here :
geant4-11-00-patch-03
might also result is segfault when running the simulation (has been shown for version geant4-11-00-patch-01 at least). In case of crashed in G4 try to upgrade to the latest version.
@benjaminhuth pointed out that ODD+Pythia8+Geant4 will segfault in full chain
I just verified this. See attached files for more information.
segfault.txt full_chain.txt