Closed aehart closed 1 year ago
I've traced this issue to nbits_
not being set for many of the TrackletLUT
objects in the emulation. This causes width
to get a junk value, which in turn causes a segmentation violation when setw
is called.
This seems to be an issue for the following methods in TrackletLUT
, i.e., in each of these methods, TrackletLUT::writeTable
is called without nbits_
first being assigned a proper value, as far as I can tell:
initmatchcut
initTPregionlut
initteptlut
initProjectionBend
initBendMatch
@skinnari @tomalin, you're more familiar with who is responsible for these various LUTs. Could you help assign people to fix these methods?
I would also recommend initializing nbits_
to some junk value, like zero perhaps, and then raising an exception at the beginning of TrackletLUT::writeTable
if nbits_
is still equal to this junk value. This is to avoid this issue in the future. Either that or add a CI test where writeTable_
is true.
Why is it crashing now, and not previously? You're now using fw_synch_220523 . Was it running successfully with an earlier branch? If so, which one?
Why is it crashing now, and not previously? You're now using fw_synch_220523 . Was it running successfully with an earlier branch? If so, which one?
The last time I ran with writeTable_
set, I was using the fw_synch220523 tag. With this tag, TrackletLUT::writeTable
only uses `nbitswhen
positive_is true. There is no call to
setw` when writing the table:
https://github.com/cms-L1TK/cmssw/blob/306e80db0e0a95eb87ed5f2cd59493da8011336f/L1Trigger/TrackFindingTracklet/src/TrackletLUT.cc#L897
Now, I am trying to use the HEAD of the L1TK-dev-12_6_0pre5 branch, in order to do another FW synchronization. Now, TrackletLUT::writeTable
uses `nbitsto set a width that is then passed to
setw`:
https://github.com/cms-L1TK/cmssw/blob/f1951c72551d09202b7586773bbe3769250c12fa/L1Trigger/TrackFindingTracklet/src/TrackletLUT.cc#L1415
https://github.com/cms-L1TK/cmssw/blob/f1951c72551d09202b7586773bbe3769250c12fa/L1Trigger/TrackFindingTracklet/src/TrackletLUT.cc#L1425
So if nbits_
is uninitialized, then this width often winds up being extremely large, which, for some reason, causes a segmentation violation when passed to setw
.
@aehart I've reproduced your crash, and confirm your helpful diagnosis. It seems that people forgot to set "nbits" in any function where positive=false. The bug was introduced in https://github.com/cms-L1TK/cmssw/pull/189 . Although I'm named as the author of this PR, all I actually did was include and resolve conflicts between two closed PRs from @aryd and @bryates https://github.com/cms-L1TK/cmssw/pull/182 and https://github.com/cms-L1TK/cmssw/pull/185 respectively, both of which changed TrackletLUT.cc . The fatal line "int width = (nbits_ + 3) / 4;" was introduced in @aryd 's PR, so he is best placed to comment.
Unless Anders makes a better suggestion, then if, immediately after the line "int width = (nbits + 3) / 4;" one adds the line "if (not positive) width = 0;", this prevents the crash and I think restores the original setw() behavior for negative numbers.
I’m completely saturated with a deadline tonight for submitting material for a review. I should be able to take a look tomorrow.
I fixed one instance of this bug where bits_ was not set a month or so ago in a pull request so I’m a little confused about why we have this problem - but I have not had a chance to look.
-Anders
Anders Ryd @.**@.>
On Feb 27, 2023, at 6:38 PM, Ian Tomalin @.**@.>> wrote:
Unless Anders makes a better suggestion, then if, immediately after the line "int width = (nbits + 3) / 4;" one adds the line "if (not positive) width = 0;", this prevents the crash and I think restores the original setw() behavior for negative numbers.
— Reply to this email directly, view it on GitHubhttps://github.com/cms-L1TK/cmssw/issues/203#issuecomment-1447286549, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABI4LTJMJW57Z7I6JAPF3ELWZU3HDANCNFSM6AAAAAAULPPDJM. You are receiving this because you were mentioned.Message ID: @.***>
This issue is addressed by https://github.com/cms-L1TK/cmssw/pull/211 . Any further comments related to it should be made in the review of that PR.
A segmentation violation now occurs when following the instructions for generating test-vectors here: https://github.com/cms-L1TK/firmware-hls/tree/504b9711c7b620ffeb6d10fbe9715c113fb67527#full-configuration
The GNU debugger reports the following line in
trklet::TrackletLUT::writeTable
as the culprit: https://github.com/cms-L1TK/cmssw/blob/23a8a8cfaf3086866b503032b56d93578de8fcc1/L1Trigger/TrackFindingTracklet/src/TrackletLUT.cc#L1425There is nothing obviously wrong with the above line, and I suspect some object, like a TrackletLUT, is not being allocated properly, or has been prematurely deallocated. I have not done any further debugging though.