cms-L1TK / firmware-hls

HLS implementation of the tracklet pattern reco modules of the Hybrid tracking chain
15 stars 24 forks source link

Reverting changes to unsigned variables causing Future SW CI crash #311

Closed mlarson02 closed 10 months ago

mlarson02 commented 10 months ago

Reverts changes made in PR #304 that changes various looped over variables in TrackBuilder.h to signed, causing error: comparison is always false due to limited range of data type [-Werror=type-limits] in future SW CI. Because this warning has been flagged as an error centrally in CMSSW, the only current fix for this is to revert these variables to not being unsigned.

aehart commented 10 months ago

@mlarson02 This is strange, since -Werror=type-limits is one of the compiler flags added in #304: https://github.com/cms-L1TK/firmware-hls/blob/f5c736c44eeb7ae6c0153aaf1ebca8f7b9bcb746/project/env_hls.tcl#L19 The only thing I can figure is that the older version of GCC that HLS uses isn't smart enough to trace the template parameters that are set to zero to the for-loops where they are compared with unsigned integers.

Anyway, you saw the compiler error in the CI that I was trying to work around when I originally made the loop counters unsigned. I guess you don't see this in the future SW emulation because you include the -Wno-sign-compare flag here (is that still needed now?): https://gitlab.cern.ch/ejclemen/cmssw_trackfinding_hlsframework/-/blob/df59dbcae854a3b7fff13a4c3a27e0e06ea1fe31/TrackFindingTrackletHLS/plugins/BuildFile.xml#L15

I think a better solution to both compiler errors is to just convert the template parameters in TrackBuilder.h to signed integers. I took the liberty of pushing this to your branch, while preserving your commit history so far. We'll see if the CI likes it, but let me know if this is acceptable to CMSSW.

mlarson02 commented 10 months ago

@mlarson02 This is strange, since -Werror=type-limits is one of the compiler flags added in #304:

https://github.com/cms-L1TK/firmware-hls/blob/f5c736c44eeb7ae6c0153aaf1ebca8f7b9bcb746/project/env_hls.tcl#L19

The only thing I can figure is that the older version of GCC that HLS uses isn't smart enough to trace the template parameters that are set to zero to the for-loops where they are compared with unsigned integers. Anyway, you saw the compiler error in the CI that I was trying to work around when I originally made the loop counters unsigned. I guess you don't see this in the future SW emulation because you include the -Wno-sign-compare flag here (is that still needed now?): https://gitlab.cern.ch/ejclemen/cmssw_trackfinding_hlsframework/-/blob/df59dbcae854a3b7fff13a4c3a27e0e06ea1fe31/TrackFindingTrackletHLS/plugins/BuildFile.xml#L15

I think a better solution to both compiler errors is to just convert the template parameters in TrackBuilder.h to signed integers. I took the liberty of pushing this to your branch, while preserving your commit history so far. We'll see if the CI likes it, but let me know if this is acceptable to CMSSW.

@aehart Thanks for pushing those changes. Yeah, I don't think suppressing the HLS warnings as we are currently doing should still be necessary, I'm not sure why that was included in the BuildFile but I think ideally we wouldn't require it.

It seems using signed template parameters has resulted in the same error (ERROR: [SYNCHK 200-61] /home/gitlab-runner/builds/biTwR5NQ/3/cms-l1tk/firmware-hls/TrackletAlgorithm/TrackBuilder.h:314: unsupported memory access on variable 'diskStubWords.V' which is (or contains) an array with unknown size at compile time.) that I was trying to work around yesterday when casting the variables to signed in each required loop. Do you have any idea why this is caused and how we could possibly fix it without inducing the same compilation errors?

tomalin commented 10 months ago

The fix https://github.com/cms-L1TK/firmware-hls/pull/311/commits/f5c736c44eeb7ae6c0153aaf1ebca8f7b9bcb746 , which I expected to work, seems to fail when running Vivado SIM on the chain https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/commit/2777aa061e6c550b71c3c82c7f9573bababee1b3 , with error https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/jobs/34825473 in TrackBuilder.h at this line https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/blob/2777aa061e6c550b71c3c82c7f9573bababee1b3/TrackletAlgorithm/TrackBuilder.h#L314 . Apparently since array diskStubWords was declared with indefinate array size, it doesn't like the range of index j being unknown at compile time. I suspect the solution is to replace in the "for" loop "j < int(NDiskStubs)" by "j < static_cast< int >(NDiskStubs)". (Though there's a chance you may need to have a separate line like "constexpr int myNum = static_cast< int >(NDiskStubs)" and then use myNum in the loop).

aehart commented 10 months ago

@mlarson02 @tomalin I think the reason the casting done in https://github.com/cms-L1TK/firmware-hls/commit/f5c736c44eeb7ae6c0153aaf1ebca8f7b9bcb746 and https://github.com/cms-L1TK/firmware-hls/commit/51a8ec1e89db55fefd65d8cdc360166b4c2d305f doesn't work is because HLS is somehow interpreting the bodies of loops before realizing the loops have zero iterations. This is very dumb, but it's my best guess as to why C-synthesis fails.

I just pushed a commit that seems to allow C-synthesis in these cases. Basically, I just added explicit checks to the loop conditions that the relevant template parameters be greater than zero. This avoids casting, and bafflingly, HLS accepts this in my testing. Let's see if the CI likes this…

mlarson02 commented 10 months ago

@mlarson02 @tomalin I think the reason the casting done in f5c736c and 51a8ec1 doesn't work is because HLS is somehow interpreting the bodies of loops before realizing the loops have zero iterations. This is very dumb, but it's my best guess as to why C-synthesis fails.

I just pushed a commit that seems to allow C-synthesis in these cases. Basically, I just added explicit checks to the loop conditions that the relevant template parameters be greater than zero. This avoids casting, and bafflingly, HLS accepts this in my testing. Let's see if the CI likes this…

@aehart Thanks for making these changes, the HLS CI and (when using this branch) Future SW CI are both passing. @tomalin Does this PR then look okay to be merged?

tomalin commented 10 months ago

Yes, though please add a comment to the code explaining the weird logic, otherwise someone is bound to "simplify" it back again in future. And please raise the silly compiler option issue with the SW group.

mlarson02 commented 10 months ago

I just added comments where necessary and created a thread with the SW group here https://cms-talk.web.cern.ch/t/problems-caused-by-werror-type-limits-compiler-flag/32918.