Open srimanob opened 5 months ago
cms-bot internal usage
A new Issue was created by @srimanob.
@sextonkennedy, @rappoccio, @makortel, @antoniovilela, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
I'd suspect the reason for the failure reported in https://github.com/cms-sw/cmssw/issues/44306 was the array in stack became large, rather than the array being variable-length itself.
As mentioned in https://github.com/cms-sw/cmssw/issues/44306#issuecomment-2100666666, VLA is a non-standard extension. From the past I recall e.g. tracking code uses
https://github.com/cms-sw/cmssw/blob/2169b5684eb29db8aa2e60eded3eedfc3479df80/CommonTools/Utils/interface/DynArray.h#L4-L5
for performance reasons (but I can't tell on the top of my head how big the impact of moving to e.g. std::vector
would be there).
Is the scope of this issue to remove VLA in L1T code, or everywhere in CMSSW?
Hi @makortel Thanks for the comment. My initial thought is how risk we are on this VLA, as this is just a pop-up issues in L1T module. It seems we will never know if we are at boundary of failure until it fails. Do we have cons for migration?
Do we have cons for migration? YES. performance.
Just avoid those very large arrays in first place.
To be safe one can use the DynArray and then at run time initialize either with a pointer to a VLA or with one to the heap depending to to size of the allocation required...
It seems we will never know if we are at boundary of failure until it fails.
This is correct for stack overflows in general. While the problem of large arrays in stack is strictly speaking orthogonal to the question of whether we should allow VLA to be used or not, the dynamic nature makes potentially large VLAs somewhat more difficult to catch in code review, PR tests, and IB tests compared to compile-time-defined arrays.
There's an additional issue that large VLAs that cause problems with multi-threaded processes may not show up in single-threaded PR tests. I haven't checked if we impose the same stack size limit in both cases.
I haven't checked if we impose the same stack size limit in both cases.
All TBB threads should use the same stack size https://github.com/cms-sw/cmssw/blob/22f751f0592542ae94a797e3fc22294ddc4626cf/FWCore/Framework/bin/cmsRun.cpp#L237-L241 https://github.com/cms-sw/cmssw/blob/22f751f0592542ae94a797e3fc22294ddc4626cf/FWCore/Concurrency/src/setNThreads.cc#L20 https://github.com/cms-sw/cmssw/blob/22f751f0592542ae94a797e3fc22294ddc4626cf/FWCore/Concurrency/src/ThreadsController.cc#L20
This is a follow up of https://github.com/cms-sw/cmssw/issues/44306 where we see the crash which cen be solved by moving away from variable-length array.
Just for fun, here is a table of all variable-length arrays in L1Trigger in CMSSW_14_1_0_pre3. I leave it to the experts of other subpackages to fix them, but hopefully this is a useful starting point.
useFit
useFitSL1
useFitSL3
activeTower
idxMu
muPtSorted
idxEg
egPtSorted
idxTau
tauPtSorted
InvDeltaRSqLUT
temp_InvDeltaRSq
isSeed
toRemove
isSeed
epbins_default
epbins
epbins_default
epbins
work
halfsorted
work
tomerge
OutTmp
outTmp2
out2
out3
ret
dupMap
noMerge
Originally posted by @aehart in https://github.com/cms-sw/cmssw/issues/44306#issuecomment-2101264168