Open makortel opened 2 years ago
assign reconstruction, heterogeneous
New categories assigned: heterogeneous,reconstruction
@jpata,@slava77,@fwyzard,@clacaputo,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Issue was created by @makortel Matti Kortelainen.
@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
FYI @VinInn @AdrianoDee
This is a CPU-only workflow, right ?
This is a CPU-only workflow, right ?
I think so. The least the SwitchProducer is using @cpu
case.
type tracking
I think the type here should be tracking and not trk (vertexing is under tracking)
Hello, Just to keep track of this issue :) This assertion failure is still present in the current release cycle:
cmsRun: /data/cmsbld/jenkins_b/workspace/build-any-ib/w/tmp/BUILDROOT/f4101ca38f0ff520e5922918c7986929/opt/cmssw/el8_aarch64_gcc11/cms/cmssw/CMSSW_13_1_X_2023-02-19-2300/src/RecoPixelVertexing/PixelVertexFinding/plugins/gpuFitVertices.h:70: void gpuVertexFinder::fitVertices(gpuVertexFinder::VtxSoAView&, gpuVertexFinder::WsSoAView&, float): Assertion `wv[i] > 0.f' failed.
See most recent stacktrace. And it is also present in LTO IBs since we build for ARM now.
I assumed this stopped failing once the HLT menu for this workflows was moved to the "fake" menu ?
cms-bot internal usage
I assumed this stopped failing once the HLT menu for this workflows was moved to the "fake" menu ?
Quite possible. On a quick look I didn't see this particular error in the IBs of past two weeks, but I also don't recall how frequent the failure was.
I assumed this stopped failing once the HLT menu for this workflows was moved to the "fake" menu ?
I guess we can make it reappear real quick by allowing 2024 here:
Do you think 12834.402
should also trigger the issue ?
I can try running that by hand on lxplus-arm
(ARM Neoverse-N1) to check.
12834.402
dos not seem to reproduce the issue, or at least not easily: I've run its step2 over 20 times on 100 events without problems on lxplus-arm
.
Do you think
12834.402
should also trigger the issue ?
12834.402
does not seem to reproduce the issue,
I don't know if it is relevant but the original workflow 11634.24
forces the magnetic field to be 0T.
Given all the changes (CUDA-to-Alpaka, related fixes in the Alpaka code, HLT menu updates) maybe we have reached the time to close this issue?
Workflow 11634.24 step 2 has been failing on el8_aarch64_gcc10 at least since CMSSW_12_4_X_2022-04-28-2300 with
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_aarch64_gcc10/CMSSW_12_4_X_2022-05-04-2300/pyRelValMatrixLogs/run/11634.24_TTbar_14TeV+2021_0T+TTbar_14TeV_TuneCP5_GenSimINPUT+Digi+RecoNano+HARVESTNano+ALCA/step2_TTbar_14TeV+2021_0T+TTbar_14TeV_TuneCP5_GenSimINPUT+Digi+RecoNano+HARVESTNano+ALCA.log