Open alexander-held opened 2 years ago
Slightly different error with N_FILES_MAX_PER_SAMPLE = 100
:
KilledWorker: ('TtbarAnalysis-522cecba9f095e893a09583293a1b218', <WorkerState 'tls://c029.af.uchicago.edu:35509', name: htcondor--194839.0--, status: closed, memory: 0, processing: 5>)
RuntimeError: Work item WorkItem(dataset='ttbar__nominal', filename='https://xrootd-local.unl.edu:1094//store/user/AGC/datasets/RunIIFall15MiniAODv2/TT_TuneCUETP8M1_13TeV-powheg-pythia8/MINIAODSIM//PU25nsData2015v1_76X_mcRun2_asymptotic_v12_ext3-v1/00000/9C747AED-4BC2-E511-BF56-AC853D9DACD3.root', treename='events', entrystart=0, entrystop=49051, fileuuid=b' \xdd\xfeB\xb8\xec\x11\xec\x98#\x02B\xac\x13\x00\x0e', usermeta={'xsec': 729.84, 'process': 'ttbar', 'nevts': 4370893, 'variation': 'nominal'}) caused a KilledWorker exception (likely a segfault or out-of-memory issue)
When running this CMS Open Data ttbar analysis at the UChicago coffea-casa instance over the full number of input files with a pure
coffea
setup,RuntimeError
exceptions start appearing typically somewhere around halfway at the pre-processing stage:The filename changes between repeated runs, so it does not seem to be related to a specific problematic input. Here is another example:
I am not sure how to best debug this further and would be happy to try out some suggestions.