Segfault for ODD+Pythia8+Geant4

andiwand commented 2 years ago

@benjaminhuth pointed out that ODD+Pythia8+Geant4 will segfault in full chain

I just verified this. See attached files for more information.

segfault.txt full_chain.txt

paulgessinger commented 2 years ago

Hm, could be the two interfere?

asalzburger commented 2 years ago

Is this executed in single thread ?

asalzburger commented 2 years ago

No good numThreads=-1 - this would need Geant4MT.

benjaminhuth commented 2 years ago

For me this crashes with one thread as well

asalzburger commented 2 years ago

Is that with the same error / segfault ?

benjaminhuth commented 2 years ago

Hmm it looks a bit different to be honest, but couldn't check in detail for no. I attache my gdb backtrace, maybe this gives a hint.

backtrace.txt

benjaminhuth commented 2 years ago

@andiwand could you maybe also run it in gdb to see if it is the same fault (its not entirely clear to me from the error message)

andiwand commented 2 years ago

Hm, could be the two interfere?

I think so yes. Pythia8 only works and Geant4 only works.

@andiwand could you maybe also run it in gdb to see if it is the same fault (its not entirely clear to me from the error message)

sure will do

Corentin-Allaire commented 2 years ago

I can confirm I just encountered the same issue using the geant4.py example : ### CAUGHT SIGNAL: 11 ### address: 0x7f9417827000, signal = SIGSEGV, value = 11, description = segmentation violation. Address not mapped to object. I tried updating to the latest G4 version (11.0.3) but it didn't change anything

Corentin-Allaire commented 2 years ago

As we discussed during today's meeting I tried to replace the ODD by the GDLM implementation of Alice_v3 and it ran through with just a few warning. So the issue is either with the ODD itself or the DDG4DetectorConstruction...

stale[bot] commented 2 years ago

This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.

Corentin-Allaire commented 2 years ago

I have had an other look at this and I just notice something. If I start removing the support from the ODD xml the segfault happen much later so maybe there is something bad with the support surface definition ?

stale[bot] commented 1 year ago

This issue/PR has been automatically marked as stale because it has not had recent activity. The stale label will be removed if any interaction occurs.

Corentin-Allaire commented 1 year ago

I was checking back this issue out of curiosity and it is still there. Maybe we should try to investigate this again at some point ?

paulgessinger commented 1 year ago

For sure this is something that we need to fix. ~~Do we have a script to reproduce this?~~

benjaminhuth commented 1 year ago

Okay, I have investigated this a bit and some new infos:

First of all, I enabled some logging facilities in Geant4, which gave me the result that this is caused by photons quite far away from the center in z direction (z is around 1e4):

This is reproducible in pythia also with different seeds. Then I also could reproduce the crash with the ParticleGun:

addParticleGun(
    s,
    MomentumConfig(0.1 * u.GeV, 2.0 * u.GeV, transverse=True),
    EtaConfig(-4.0, 4.0, uniform=True),
    ParticleConfig(2, acts.PdgParticle.eGamma),
    vtxGen=acts.examples.GaussianVertexGenerator(
        stddev=acts.Vector4(10 * u.mm, 10 * u.mm, 10 * u.mm, 0.0 * u.ns),
        mean=acts.Vector4(18, 3.78, 1.09e4, 0),
    ),
    multiplicity=100,
    rnd=rnd,
)

I'm not totally sure what to do with these information, but maybe someone has an idea :)

paulgessinger commented 1 year ago

So it's G4 breaking in a specific region of the detector?

Corentin-Allaire commented 1 year ago

Wait the energy goes to 0 in the second step. Could it be that G4 doesn't handle photon stopping in some volumes ?

benjaminhuth commented 1 year ago

Wait the energy goes to 0 in the second step. Could it be that G4 doesn't handle photon stopping in some volumes ?

No I think with the electron in the pixel endcap is everythin fine, the photon below is the problem. There it only loggs the 0th step and then segfaults.

I could imagine that a problem is that it starts already outside of the detector (in the world_volume_1)?

Could it be that the world volume is to small or something like that?

Corentin-Allaire commented 1 year ago

Oh yeah I was looking at the wrong line... But you are right, the world volume size is 10m along z so this photon is outside the DD4Hep detector.

Corentin-Allaire commented 1 year ago

Unfortunately, I don't think this is the only issue :( I tried to edit the particle selector to remove all particle with x, y or z larger than 5m (in abs) and it still crashes with ttbar. How did you get those extra log Benjamin ?

benjaminhuth commented 1 year ago

Allready merged: https://github.com/acts-project/acts/pull/1790 With a new build from main branch you should be able to enable it via setting the logLevel to VERBOSE in the addGeant4 function.

Corentin-Allaire commented 1 year ago

Oh perfect I will have a look next week in more detail then !

benjaminhuth commented 1 year ago

Unfortunately, I don't think this is the only issue :( I tried to edit the particle selector to remove all particle with x, y or z larger than 5m (in abs) and it still crashes with ttbar. How did you get those extra log Benjamin ?

Actually I was able to run one event in the pythia8+geant4+ODD combination without segfault by increasing the world volumen manually from 10m to 100m in the ODD xml files...

I'm not sure if something like that would be a reasonable fix? Has this any other implications @asalzburger ?

I will try to run more events now, however, they take quite a long time (around 30 minutes per event)

benjaminhuth commented 1 year ago

I will try to run more events now, however, they take quite a long time (around 30 minutes per event)

Okay, actually it does not resolve the issue, I still get the segfault in a later event. Maybe it has just changed the random numbers a bit so that 1 event went through.

Corentin-Allaire commented 1 year ago

A bit unrelated but there is a bug in 'addGeant4' in 'simulation.py'. Line 597 it uses particles_input for the G4 input (instead of particles_selected) ignoring the particle selector. I can open a quick MR to fix this

Corentin-Allaire commented 1 year ago

A bit unrelated but there is a bug in 'addGeant4' in 'simulation.py'. Line 597 it uses particles_input for the G4 input (instead of particles_selected) ignoring the particle selector. I can open a quick MR to fix this

If someone wants to have a look : https://github.com/acts-project/acts/pull/1792

Corentin-Allaire commented 1 year ago

With this you can cut the particle outside the detector by adding preselectParticles = ParticleSelectorConfig(eta=(-3.0, 3.0),absZ=(0, 1e4), pt=(150 * u.MeV, None), removeNeutral=True), to the addGeant4. Doesn't solve the segfault in the ttbar case (but solve the photon issue).

Corentin-Allaire commented 1 year ago

Actually the code seem to be running on my side and doesn't segfault anymore... Can someone else confirm ?

andiwand commented 1 year ago

@Corentin-Allaire are you using the chain from above? otherwise if you could share the script I can try to verify

Corentin-Allaire commented 1 year ago

@andiwand here is the chain I use : full_chain_odd.txt

andiwand commented 1 year ago

this is still segfaulting for me on 7a3761d2b3f35c802bc03622d3b55fc9d463e426

segfault.txt

Corentin-Allaire commented 1 year ago

I might have changed something else by accident let me check (maybe you can also run it with verbose log of G4 ?)

benjaminhuth commented 1 year ago

Actually, for me it worked now at least for two events without segfault (I only applied the z-selection, not the pt or eta ones). Thats quite good news.

However, I got the following interesting warning:

Maybe I should add I had a timestamp-based seed, not the usual 42.

andiwand commented 1 year ago

did you run geant4 in verbose mode @benjaminhuth ? somehow it runs now for a couple of minutes without crashing

benjaminhuth commented 1 year ago

nope, at least that last one not. But I think we must consider the verbose mode to be extreeemly slow due to this huge printouts...

andiwand commented 1 year ago

hm for me it is still crashing even with 2 events

this is my geant version:

**************************************************************
 Geant4 version Name: geant4-11-00-patch-01 [MT]   (8-March-2022)
                       Copyright : Geant4 Collaboration
                      References : NIM A 506 (2003), 250-303
                                 : IEEE-TNS 53 (2006), 270-278
                                 : NIM A 835 (2016), 186-225
                             WWW : http://geant4.org/
**************************************************************

but feel free to close the ticket since it works for both of you now

Corentin-Allaire commented 1 year ago

@andiwand did you keep the number of thread to 1 ?

Corentin-Allaire commented 1 year ago

I realised I modified the preselection to cut on x,y and z so I removed that change but it still works on 5 events for now. I do have a slightly more recent G4 version :


 Geant4 version Name: geant4-11-00-patch-03 [MT]   (16-September-2022)
                       Copyright : Geant4 Collaboration
                      References : NIM A 506 (2003), 250-303
                                 : IEEE-TNS 53 (2006), 270-278
                                 : NIM A 835 (2016), 186-225
                             WWW : http://geant4.org/
**************************************************************```

andiwand commented 1 year ago

yeah exactly I did't modify the script. just executed the one you sent here https://github.com/acts-project/acts/issues/1578#issuecomment-1401645240

benjaminhuth commented 1 year ago

So I have geant4-11-00-patch-03 as well, and for me it also works now with 5 events. Maybe its indeed the geant version?

andiwand commented 1 year ago

let me update and check again

Corentin-Allaire commented 1 year ago

I remember reading there was some issue with patch 01 witch is why I updated the first time I run into the odd segfault

Corentin-Allaire commented 1 year ago

Let me also open and MR to solve to let people use G4 with the Odd

benjaminhuth commented 1 year ago

Actually I think this is a good workaround, though not 100% satisfying...

I just wonder if the change in the full_chain_odd.py is enough to kind of document this for others?

Corentin-Allaire commented 1 year ago

In my opinion this is a pythia issue and not an acts one

Corentin-Allaire commented 1 year ago

I have opened a PR : https://github.com/acts-project/acts/pull/1794. I am mentioning this issue in the comment of the code if people want to understand the full story

andiwand commented 1 year ago

I recompiled G4 with 11.1.0 and it seems not to crash anymore

Corentin-Allaire commented 1 year ago

I am in the process of running on 100 events just to check that we don't get particles with X or Y > 10m

Corentin-Allaire commented 1 year ago

I had the simulation run for 100 ttbar events and no issue occurred. I think we can close this one then ! As a summary in case someone comes back here :

Pythia sometime generate particles outside the detector world volume (in particular in Z), using a ParticleSelector can resolve the issue. If the issue was to come back, we might need to extend the ParticleSelector to also cut in X and Y.
G4 simulation cannot be run in multithreaded mode in Acts, doing so will result in crashes.
G4 version anterior to geant4-11-00-patch-03 might also result is segfault when running the simulation (has been shown for version geant4-11-00-patch-01 at least). In case of crashed in G4 try to upgrade to the latest version.

acts-project / acts

Segfault for ODD+Pythia8+Geant4 #1578