cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.3k forks source link

SegFault in OscarMTProducer when running unoptimized code #40090

Closed Dr15Jones closed 1 year ago

Dr15Jones commented 1 year ago

So I have been doing some tests using OscarMTProducer and hit a weird case where I build the code using export USER_CXXFLAGS='-O0 -g' and then run MinBias jobs. I see a crash at line 220

https://github.com/cms-sw/cmssw/blob/c26f91904ea1758995053b69f199732c519c12c9/SimG4Core/Application/src/StackingAction.cc#L218-L220

because the return value of ptr->GetSelectedProcess() is a nullptr.

More info on how to reproduce is below.

Dr15Jones commented 1 year ago

assign simulation

cmsbuild commented 1 year ago

A new Issue was created by @Dr15Jones Chris Jones.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild commented 1 year ago

New categories assigned: simulation

@mdhildreth,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks

Dr15Jones commented 1 year ago

So what I did was

I then ran the job configuration below and the job fails around the 400th event

# Auto generated configuration file
# using: 
# Revision: 1.19 
# Source: /local/reps/CMSSW/CMSSW/Configuration/Applications/python/ConfigBuilder.py,v 
# with command line options: MinBias_8TeV_pythia8_TuneCUETP8M1_cff --relval 9000,300 -s GEN,SIM -n 700 --conditions auto:phase1_2018_realistic --beamspot Realistic25ns13TeVEarly2018Collision --era Run2_2018 --geometry DB:Extended --datatier GEN-SIM --eventcontent RAWSIM --fileout file:pileup.root --nThreads 12
import FWCore.ParameterSet.Config as cms

from Configuration.Eras.Era_Run2_2018_cff import Run2_2018

process = cms.Process('SIM',Run2_2018)

# import of standard configurations
process.load('Configuration.StandardSequences.Services_cff')
process.load('SimGeneral.HepPDTESSource.pythiapdt_cfi')
process.load('FWCore.MessageService.MessageLogger_cfi')
process.load('Configuration.EventContent.EventContent_cff')
process.load('SimGeneral.MixingModule.mixNoPU_cfi')
process.load('Configuration.StandardSequences.GeometryRecoDB_cff')
process.load('Configuration.StandardSequences.GeometrySimDB_cff')
process.load('Configuration.StandardSequences.MagneticField_cff')
process.load('Configuration.StandardSequences.Generator_cff')
process.load('IOMC.EventVertexGenerators.VtxSmearedRealistic25ns13TeVEarly2018Collision_cfi')
process.load('GeneratorInterface.Core.genFilterSummary_cff')
process.load('Configuration.StandardSequences.SimIdeal_cff')
process.load('Configuration.StandardSequences.EndOfProcess_cff')
process.load('Configuration.StandardSequences.FrontierConditions_GlobalTag_cff')

process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(8000),
    output = cms.optional.untracked.allowed(cms.int32,cms.PSet)
)

# Input source
process.source = cms.Source("EmptySource")

process.options = cms.untracked.PSet(
    FailPath = cms.untracked.vstring(),
    IgnoreCompletely = cms.untracked.vstring(),
    Rethrow = cms.untracked.vstring(),
    SkipEvent = cms.untracked.vstring(),
    accelerators = cms.untracked.vstring('*'),
    allowUnscheduled = cms.obsolete.untracked.bool,
    canDeleteEarly = cms.untracked.vstring(),
    deleteNonConsumedUnscheduledModules = cms.untracked.bool(True),
    dumpOptions = cms.untracked.bool(False),
    emptyRunLumiMode = cms.obsolete.untracked.string,
    eventSetup = cms.untracked.PSet(
        forceNumberOfConcurrentIOVs = cms.untracked.PSet(
            allowAnyLabel_=cms.required.untracked.uint32
        ),
        numberOfConcurrentIOVs = cms.untracked.uint32(0)
    ),
    fileMode = cms.untracked.string('FULLMERGE'),
    forceEventSetupCacheClearOnNewRun = cms.untracked.bool(False),
    holdsReferencesToDeleteEarly = cms.untracked.VPSet(),
    makeTriggerResults = cms.obsolete.untracked.bool,
    modulesToIgnoreForDeleteEarly = cms.untracked.vstring(),
    numberOfConcurrentLuminosityBlocks = cms.untracked.uint32(0),
    numberOfConcurrentRuns = cms.untracked.uint32(1),
    numberOfStreams = cms.untracked.uint32(0),
    numberOfThreads = cms.untracked.uint32(1),
    printDependencies = cms.untracked.bool(False),
    sizeOfStackForThreadsInKB = cms.optional.untracked.uint32,
    throwIfIllegalParameter = cms.untracked.bool(True),
    wantSummary = cms.untracked.bool(True)
)

# Production Info
process.configurationMetadata = cms.untracked.PSet(
    annotation = cms.untracked.string('MinBias_8TeV_pythia8_TuneCUETP8M1_cff nevts:700'),
    name = cms.untracked.string('Applications'),
    version = cms.untracked.string('$Revision: 1.19 $')
)

# Output definition

process.RAWSIMoutput = cms.OutputModule("AsciiOutputModule",
    SelectEvents = cms.untracked.PSet(
        SelectEvents = cms.vstring('generation_step')
    ),
    outputCommands = process.RAWSIMEventContent.outputCommands,
)

# Additional output definition

# Other statements
if hasattr(process, "XMLFromDBSource"): process.XMLFromDBSource.label="Extended"
if hasattr(process, "DDDetectorESProducerFromDB"): process.DDDetectorESProducerFromDB.label="Extended"
process.genstepfilter.triggerConditions=cms.vstring("generation_step")
from Configuration.AlCa.GlobalTag import GlobalTag
process.GlobalTag = GlobalTag(process.GlobalTag, 'auto:phase1_2018_realistic', '')

process.generator = cms.EDFilter("Pythia8ConcurrentGeneratorFilter",
    PythiaParameters = cms.PSet(
        parameterSets = cms.vstring(
            'pythia8CommonSettings',
            'pythia8CUEP8M1Settings',
            'processParameters'
        ),
        processParameters = cms.vstring(
            'SoftQCD:nonDiffractive = on',
            'SoftQCD:singleDiffractive = on',
            'SoftQCD:doubleDiffractive = on'
        ),
        pythia8CUEP8M1Settings = cms.vstring(
            'Tune:pp 14',
            'Tune:ee 7',
            'MultipartonInteractions:pT0Ref=2.4024',
            'MultipartonInteractions:ecmPow=0.25208',
            'MultipartonInteractions:expPow=1.6'
        ),
        pythia8CommonSettings = cms.vstring(
            'Tune:preferLHAPDF = 2',
            'Main:timesAllowErrors = 10000',
            'Check:epTolErr = 0.01',
            'Beams:setProductionScalesFromLHEF = off',
            'SLHA:minMassSM = 1000.',
            'ParticleDecays:limitTau0 = on',
            'ParticleDecays:tau0Max = 10',
            'ParticleDecays:allowPhotonRadiation = on'
        )
    ),
    comEnergy = cms.double(8000.0),
    crossSection = cms.untracked.double(71390000000.0),
    filterEfficiency = cms.untracked.double(1.0),
    maxEventsToPrint = cms.untracked.int32(0),
    pythiaHepMCVerbosity = cms.untracked.bool(False),
    pythiaPylistVerbosity = cms.untracked.int32(1)
)

process.ProductionFilterSequence = cms.Sequence(process.generator)

# Path and EndPath definitions
process.generation_step = cms.Path(process.pgen)
process.simulation_step = cms.Path(process.psim)
process.genfiltersummary_step = cms.EndPath(process.genFilterSummary)
process.endjob_step = cms.EndPath(process.endOfProcess)
process.RAWSIMoutput_step = cms.EndPath(process.RAWSIMoutput)

# Schedule definition
process.schedule = cms.Schedule(process.generation_step,process.genfiltersummary_step,process.simulation_step,process.endjob_step,process.RAWSIMoutput_step)
from PhysicsTools.PatAlgos.tools.helpers import associatePatAlgosToolsTask
associatePatAlgosToolsTask(process)

#Setup FWK for multithreaded
process.options.numberOfThreads = 12
process.options.numberOfStreams = 0
# filter all path with the production filter sequence
for path in process.paths:
    getattr(process,path).insert(0, process.ProductionFilterSequence)

# Customisation from command line

# Add early deletion of temporary data products to reduce peak memory need
from Configuration.StandardSequences.earlyDeleteSettings_cff import customiseEarlyDelete
process = customiseEarlyDelete(process)
# End adding early deletion
Dr15Jones commented 1 year ago

The relevant part of the stack trace is

#5  0x00007f6fc15609a4 in G4VProcess::GetProcessSubType (this=0x0) at /cvmfs/cms-ib.cern.ch/nweek-02757/el8_amd64_gcc10/external/geant4/10.7.2-52c0e49b75ade9f3c4ac335106000cd9/include/Geant4/G4VProcess.hh:402
#6  0x00007f6fc1528daa in StackingAction::ClassifyNewTrack (this=0x7f6fa2a48100, aTrack=0x7f6f69374a10) at /uscms_data/d2/cdj/build/temp/onfly_premix/original/CMSSW_12_6_X_2022-11-02-1100/src/SimG4Core/Application/src/StackingAction.cc:220
#7  0x00007f6fc1be3d2b in G4StackManager::PushOneTrack(G4Track*, G4VTrajectory*) () from /uscms_data/d2/cdj/build/temp/onfly_premix/original/CMSSW_12_6_X_2022-11-02-1100/biglib/el8_amd64_gcc10/pluginSimulation.so
#8  0x00007f6fc18dd9cb in G4EventManager::StackTracks(std::vector<G4Track*, std::allocator<G4Track*> >*, bool) () from /uscms_data/d2/cdj/build/temp/onfly_premix/original/CMSSW_12_6_X_2022-11-02-1100/biglib/el8_amd64_gcc10/pluginSimulation.so
#9  0x00007f6fc18de0cd in G4EventManager::DoProcessing(G4Event*) () from /uscms_data/d2/cdj/build/temp/onfly_premix/original/CMSSW_12_6_X_2022-11-02-1100/biglib/el8_amd64_gcc10/pluginSimulation.so
#10 0x00007f6fc15258c0 in RunManagerMTWorker::produce (this=0x7f6fa2a2b000, inpevt=..., es=..., runManagerMaster=...) at /uscms_data/d2/cdj/build/temp/onfly_premix/original/CMSSW_12_6_X_2022-11-02-1100/src/SimG4Core/Application/src/RunManagerMTWorker.cc:539
#11 0x00007f6fc156a0fd in operator() (__closure=0x7f6fc99f8578) at /uscms_data/d2/cdj/build/temp/onfly_premix/original/CMSSW_12_6_X_2022-11-02-1100/src/SimG4Core/Application/plugins/OscarMTProducer.cc:261
#12 0x00007f6fc156c624 in omt::ThreadHandoff::Functor<OscarMTProducer::produce(edm::Event&, const edm::EventSetup&)::<lambda()> >::execute(void) (this=0x7f6fc99f8570) at /uscms_data/d2/cdj/build/temp/onfly_premix/original/CMSSW_12_6_X_2022-11-02-1100/src/SimG4Core/Application/interface/ThreadHandoff.h:74
#13 0x00007f6fc152d6ca in omt::ThreadHandoff::threadLoop (iArgs=0x7f6faf401710) at /uscms_data/d2/cdj/build/temp/onfly_premix/original/CMSSW_12_6_X_2022-11-02-1100/src/SimG4Core/Application/src/ThreadHandoff.cc:111
#14 0x00007f70010661ca in start_thread () from /lib64/libpthread.so.0
#15 0x00007f7000cd1e73 in clone () from /lib64/libc.so.6
Dr15Jones commented 1 year ago

The track being processed by the StackingAction::ClassifyNewTrack has the following info

(gdb) print *aTrack
$2 = {fPosition = {data = {-2.3084535472497882, 46.850606039502615, 3124.2145189610064}, static tolerance = 2.22045e-14}, fGlobalTime = 10.735914004372846, fLocalTime = 0, fTrackLength = 0, fVelocity = 299.79245800000001, fpTouchable = {
    fObj = 0x7fff49150000}, fpNextTouchable = {fObj = 0x0}, fpOriginTouchable = {fObj = 0x7fff49150000}, fpDynamicParticle = 0x7fff49152ca0, fTrackStatus = fAlive, fStepLength = 0, fWeight = 1, fpStep = 0x0, fVtxPosition = {data = {0, 0, 0}, 
    static tolerance = 2.22045e-14}, fVtxMomentumDirection = {data = {0, 0, 0}, static tolerance = 2.22045e-14}, fVtxKineticEnergy = 0, fpLVAtVertex = 0x0, fpCreatorProcess = 0x7fff7b550700, fpUserInformation = 0x0, prev_mat = 0x0, 
  groupvel = 0x0, prev_velocity = 0, prev_momentum = 0, fpAuxiliaryTrackInformationMap = 0x0, fCurrentStepNumber = 0, fCreatorModelIndex = 35, fParentID = 40803, fTrackID = 41661, fBelowThreshold = false, fGoodForTracking = false, 
  is_OpticalPhoton = false, useGivenVelocity = false}
(gdb) print *aTrack->fpDynamicParticle
$3 = {theMomentumDirection = {data = {-0.046642578135399994, 0.0075677788186800255, 0.99888297544238625}, static tolerance = 2.22045e-14}, thePolarization = {data = {0, 0, 0}, static tolerance = 2.22045e-14}, 
  theParticleDefinition = 0x7fff876d3100, theElectronOccupancy = 0x0, thePreAssignedDecayProducts = 0x0, primaryParticle = 0x0, theKineticEnergy = 256.44442852635405, theLogKineticEnergy = 1.7976931348623157e+308, theBeta = -1, 
  theProperTime = 0, theDynamicalMass = 0.51099890999999997, theDynamicalCharge = -1, theDynamicalSpin = 0.5, theDynamicalMagneticMoment = -5.7950947692434883e-08, thePreAssignedDecayTime = -1, verboseLevel = 1, thePDGcode = 0}
civanch commented 1 year ago

@Dr15Jones , thanks, I will to reproduce the issue.

Dr15Jones commented 1 year ago

As a further check, if I checkout the code and compile without export USER_CXXFLAGS='-O0 -g' the job runs well beyond the point it had previously failed. So it doesn't appear to be related to having libraries locally.

Dr15Jones commented 1 year ago

Some further info. I first saw this problem doing something quite different and there I was saw it also using 1 thread so it seems unlikely to be a thread related problem.

What I gave as a reproducer in this issue was after several attempts at simplifying the issue to be sure code changes I had made were not causing the issue.

civanch commented 1 year ago

I would expect #40098 should clean-up the situation but I do not fully understand the issue. If it is simply corrupted pointers why the result depends on geometry? The problem is seen in one geometry and not seen in another while Geant4 is the same.

Dr15Jones commented 1 year ago

So I see another problem stemming from the same case where aTrack->GetCreatorProcess() returns a nullptr. This time the segmentation fault happened in StackingAction::isItPrimaryDecayProductOrConversion

https://github.com/cms-sw/cmssw/blob/34601f933a467546074cbbbb92d14c46f36eadc2/SimG4Core/Application/src/StackingAction.cc#L441-L446

with the call stack

#9  0x00007efe60a4cdab in StackingAction::isItPrimaryDecayProductOrConversion(G4Track const*, G4Track const&) const () from /uscms_data/d2/cdj/build/temp/onfly_premix/CMSSW_12_6_0_pre5/biglib/el8_amd64_gcc10/pluginSimulation.so
#10 0x00007efe60a4d076 in StackingAction::ClassifyNewTrack(G4Track const*) () from /uscms_data/d2/cdj/build/temp/onfly_premix/CMSSW_12_6_0_pre5/biglib/el8_amd64_gcc10/pluginSimulation.so
#11 0x00007efe610cc9cb in G4StackManager::PushOneTrack(G4Track*, G4VTrajectory*) () from /uscms_data/d2/cdj/build/temp/onfly_premix/CMSSW_12_6_0_pre5/biglib/el8_amd64_gcc10/pluginSimulation.so
#12 0x00007efe60dc627b in G4EventManager::StackTracks(std::vector<G4Track*, std::allocator<G4Track*> >*, bool) () from /uscms_data/d2/cdj/build/temp/onfly_premix/CMSSW_12_6_0_pre5/biglib/el8_amd64_gcc10/pluginSimulation.so
#13 0x00007efe60dc697d in G4EventManager::DoProcessing(G4Event*) () from /uscms_data/d2/cdj/build/temp/onfly_premix/CMSSW_12_6_0_pre5/biglib/el8_amd64_gcc10/pluginSimulation.so
#14 0x00007efe60a47c0f in RunManagerMTWorker::produce(edm::Event const&, edm::EventSetup const&, RunManagerMT&) () from /uscms_data/d2/cdj/build/temp/onfly_premix/CMSSW_12_6_0_pre5/biglib/el8_amd64_gcc10/pluginSimulation.so

The weird thing is I added a check assert(aTrack->GetCreatorProcess()) into the function just above in the call stack, i.e. StackingAction::ClassifyNewTrack and that assert did NOT fail. So it seems like the value of aTrack->GetCreatorProcess() changed while the routine was running. I did run my test with 8 threads, however it consistently crashes around the 40th event mark so I think it is deterministic (which would not be something I'd expect if it were a threading related problem).

Dr15Jones commented 1 year ago

So something very strange is happening here. Earlier in StackingAction::ClassifyNewTrack we have

https://github.com/cms-sw/cmssw/blob/34601f933a467546074cbbbb92d14c46f36eadc2/SimG4Core/Application/src/StackingAction.cc#L180

so if aTrack->GetCreatorProcess() is nullptr, we should have processed that if block and never reached

https://github.com/cms-sw/cmssw/blob/34601f933a467546074cbbbb92d14c46f36eadc2/SimG4Core/Application/src/StackingAction.cc#L213-L217

so we should never have needed to put in the new protection! So somehow the value returned by aTrack->GetCreatorProcess() changes from the first if call (where it is not nullptr) to the body of the later else block.

Dr15Jones commented 1 year ago

It seems to me like something is overwriting the values associated with aTrack. Is it possible that aTrack was actually deleted before this call happened?

civanch commented 1 year ago

@Dr15Jones , I agree that the extra check which I recently introduced should not be added in this form, because process limiting the step is check earlier.

G4Track containers are stored inside memory via G4Allocator, new and delete are done by the G4Allocator. Track, which is deleted should not come to StackingAction.

So, the memory is corrupted in another way. Theoretical question: is it correct to compile part of code with -O0, another part - with -O3 ?

dan131riley commented 1 year ago

Yesterday's UBSAN IB:

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/el8_amd64_gcc11/CMSSW_13_0_UBSAN_X_2022-11-28-1100/pyRelValMatrixLogs/run/140.0_HydjetQ_B12_5020GeV_2011+HydjetQ_B12_5020GeV_2011+DIGIHI2011+RECOHI2011+HARVESTHI2011/step1_HydjetQ_B12_5020GeV_2011+HydjetQ_B12_5020GeV_2011+DIGIHI2011+RECOHI2011+HARVESTHI2011.log

/pool/condor/dir_25578/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1b2eaa275701962dee4e708dbb2a0228/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2022-11-28-1100/src/SimG4Core/Application/src/StackingAction.cc:221:44: runtime error: member call on null pointer of type 'const struct G4VProcess'
    #0 0x2ad211806f07  (/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2022-11-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so+0x5b4cf07)
    #1 0x2ad213a82df9  (/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2022-11-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so+0x7dc8df9)
    #2 0x2ad21377ea5a  (/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2022-11-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so+0x7ac4a5a)
    #3 0x2ad21377f17c  (/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2022-11-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so+0x7ac517c)
    #4 0x2ad2117c63c2  (/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2022-11-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so+0x5b0c3c2)
    #5 0x2ad2116971f8  (/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2022-11-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so+0x59dd1f8)
    #6 0x2ad211864a7c  (/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_UBSAN_X_2022-11-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so+0x5baaa7c)
    #7 0x2ad168b8b1ce in start_thread (/lib64/libpthread.so.0+0x81ce)
    #8 0x2ad168ddce72 in clone (/lib64/libc.so.6+0x39e72)

which says line 221, https://github.com/cms-sw/cmssw/blob/830c070592d91b7799eb39e1a3a15b232052ab61/SimG4Core/Application/src/StackingAction.cc#L217-L224 after proc gets redefined

dan131riley commented 1 year ago

Theoretical question: is it correct to compile part of code with -O0, another part - with -O3 ?

Yes, it's perfectly fine. Optimization doesn't change the ABI, and that's all that matters. Would be a disaster if different optimization levels were incompatible.

Dr15Jones commented 1 year ago

So the UBSAN report could explain the later crash I see as the call on line 222 track->SetCreatorProcess(proc) would set the creator process to nullptr and then later in the function when it calls StackingAction::isItPrimaryDecayProductOrConversion that would lead to a crash.

smuzaffar commented 1 year ago

While testing GCC11/LTO for simulation jobs I also noticed that MinBias (with 16 threads) and TTBar (for 8 threads) always crash with error

#5  0x00007fc6d4402f8b in StackingAction::isItPrimaryDecayProductOrConversion(G4Track const*, G4Track const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_LTO_X_2022-11-24-1100/biglib/el8_amd64_gcc11/pluginSimulation.so
#6  0x00007fc6d44033be in StackingAction::ClassifyNewTrack(G4Track const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_LTO_X_2022-11-24-1100/biglib/el8_amd64_gcc11/pluginSimulation.so
#7  0x00007fc6d5130479 in G4StackManager::PushOneTrack(G4Track*, G4VTrajectory*) [clone .isra.0] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_LTO_X_2022-11-24-1100/biglib/el8_amd64_gcc11/pluginSimulation.so
#8  0x00007fc6d46ba63e in G4EventManager::DoProcessing(G4Event*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_LTO_X_2022-11-24-1100/biglib/el8_amd64_gcc11/pluginSimulation.so
#9  0x00007fc6d440b81c in RunManagerMTWorker::produce(edm::Event const&, edm::EventSetup const&, RunManagerMT&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_LTO_X_2022-11-24-1100/biglib/el8_amd64_gcc11/pluginSimulation.so
#10 0x00007fc6d4417242 in omt::ThreadHandoff::Functor<OscarMTProducer::produce(edm::Event&, edm::EventSetup const&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_LTO_X_2022-11-24-1100/biglib/el8_amd64_gcc11/pluginSimulation.so
#11 0x00007fc6d43fc80a in omt::ThreadHandoff::threadLoop(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_LTO_X_2022-11-24-1100/biglib/el8_amd64_gcc11/pluginSimulation.so
#12 0x00007fc710c111cf in start_thread () from /lib64/libpthread.so.0
#13 0x00007fc71087ce73 in clone () from /lib64/libc.so.6
civanch commented 1 year ago

I will try to revisit the code.

civanch commented 1 year ago

@Dr15Jones , @dan131riley, is it possible to check if #40180 improve the situation? I cannot do this promptly.

smuzaffar commented 1 year ago

@civanch , I have tested https://github.com/cms-sw/cmssw/pull/40180 for LTO builds and the tests (TTbar with 8 threads and MinBias 16 threads) which were failing before work now.

civanch commented 1 year ago

The issue is fully understood. It is connected with the ComptonScattering process. When it happens, primary gamma continue tracking and electron is produced. Due to that creator process for electron is not correctly identified in all cases. The fix will be committed soon. It may be backported to 12_6. The problem does not exist in other releases.

civanch commented 1 year ago

PR #40252 resolve issues for 13_0 with Geant4 10.7.2 and may be backported to 12_6. There is no need to backport to early releases, because G4GammaGeneralProcess is not used in these releases.