Closed tomeichlersmith closed 2 months ago
Canceled the PR Validation since I am just making sure this builds with warnings-as-errors.
For some reason, this only was shown on trunk
Is it because the branch Rory used was not rebased to the version of trunk that had my update for the CI?
That's why I think this wasn't caught while test-building that branch, but I still don't know why this wasn't caught in previous rounds of more strict building since this code has been there for awhile. :shrug:
Doesn't really matter, this is an easy fix.
crap, this just revealed more issues
https://github.com/LDMX-Software/ldmx-sw/actions/runs/10960598567/job/30435575518?pr=1470
the good news is that it builds and runs when not treating warnings-as-errors and so its not super urgent
Some of the new stuff is in HLS_arbitrary_Precision_Types
, this line https://github.com/LDMX-Software/ldmx-sw/blob/trunk/CMakeLists.txt#L37C36-L37C65 was supressing those in the past but I think with this addition
https://github.com/LDMX-Software/ldmx-sw/blob/5598eaa3a4129dda761cf8375b1e948cdc76a6d3/TrigScint/CMakeLists.txt#L32-L37
it's resurfucing again
I fixed up our internal stuff, all that's left is the HLS stuff.
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/etc/ap_private.h:2109:32: warning: The left operand of '<<' is a garbage value [clang-analyzer-core.UndefinedBinaryOperatorResult]
? ((((int64_t)VAL) << (excess_bits)) >> (excess_bits))
^
/home/tom/code/ldmx/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:293:14: note: Calling 'operator+<12, true, 12, true>'
float pe = outTrk.Pad1.Seed.Amp + outTrk.Pad1.Sec.Amp;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/ap_int_base.h:1320:1: note: Calling constructor for 'ap_int_base<13, true>'
OP_BIN_AP(+, plus)
^
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/ap_int_base.h:1310:37: note: expanded from macro 'OP_BIN_AP'
_AP_W2, _AP_S2>::Rty##_base lhs(op); \
^~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/ap_int_base.h:179:10: note: Calling default constructor for 'ssdm_int_sim<13, true>'
INLINE ap_int_base(const ap_int_base<_AP_W2, _AP_S2>& op) {
^~~~~~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/ap_common.h:248:3: note: Calling default constructor for 'ap_private<13, true, true>'
ssdm_int_sim() {}
^~~~~~~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/etc/ap_private.h:1595:5: note: Calling 'ap_private::clearUnusedBits'
clearUnusedBits();
^~~~~~~~~~~~~~~~~
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/etc/ap_private.h:2108:9: note: '?' condition is true
_AP_S
^
/home/tom/code/ldmx/ldmx-sw/TrigScint/../Trigger/HLS_arbitrary_Precision_Types/include/etc/ap_private.h:2109:32: note: The left operand of '<<' is a garbage value
? ((((int64_t)VAL) << (excess_bits)) >> (excess_bits))
~~~ ^
which, as you point out @tvami , can probably be silenced by informing CMake that the HLS headers are not our problem.
So in the end, this is not just something we can hide under the rug, there is some issue in the input in
TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx
the track level seems fine
but either at the outTrk.Pad1
or outTrk.Pad1.Seed
or outTrk.Pad1.Seed.Amp
(same with the Sec
) there is something uninitialized. I didnt find it tho :(
This gave me the idea to just use {}
within the struct
declarations telling C++ to use value initialization which avoids nonsense (random memory) values. e.g.
I /think/ this got past the error on my local computer so I'm pushing it here to double check.
Edit: nevermind. This didn't work locally.
I was thinking if we could just set the defaults it would fix it, but given that this is C-style, I dont think we can do that. I think having
ap_int<12> mID{0}, bID{-1};
would actually solve the problem, but we cant do that, right?
I don't know if this code needs to be C-style or if that was just most convenient when trying to write the emulation.
OK, I tried, it doesnt work anyway... :(
I tried if getting rid of calcTCent
would resolve it, but then I get this (meaning that the issue roots in constructing the lookup table )
/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:63:3: note: Loop condition is true. Entering loop body
for (int i = 0; i < NCENT; i++) {
^
/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:64:5: note: Loop condition is true. Entering loop body
for (int j = 0; j < COMBO; j++) {
^
/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:65:26: note: Calling 'operator-<12, true>'
LOOKUP[i][j][0] = (i - A[1] + A[0]);
^~~~~~~~
And honestly I dont understand what's happening here
ap_int<12> A[3] = {0, 0, 0};
ap_int<12> LOOKUP[NCENT][COMBO][2];
LOOKUP[i][j][0] = (i - A[1] + A[0]);
isnt A[1]
the same as A[0]
and both are zeros?
A[i] is an alignment matrix. The idea is that the three layers (Pad1, Pad2, and Pad3) may be translated w.r.t eachother and said translation may be determined by the vector A as we only care how they are misaligned in one axis (the axis of granularity of the TS). The factor (i - A[1] + A[0]) only becomes nontrivial if A[1] and A[0] are different indicating an initially misaligned state. This will be important when we first start checking LDMX because if there is alignment issues we wont see it without scanning A.
I see, thanks @rodwyer100 ! Would it be ok to initialize the LOOKUP table to all 0 before you fill in the real values?
ehh
/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:62:3: note: Loop condition is true. Entering loop body
for (int i = 0; i < NCENT; ++i) {
^
for (int j = 0; j < COMBO; ++j) {
^
for (int k = 0; k < 2; ++k) {
^
/home/vamitamas/patch-float-for-loop-counter/ldmx-sw/TrigScint/src/TrigScint/TrigScintFirmwareTracker.cxx:65:31: note: Calling constructor for 'ap_int<12>'
LOOKUP[i][j][k] = ap_int<12>(0);
^~~~~~~~~~~~~
it doesnt like anything with ap_int<12>
constructor.
hey @tomeichlersmith @rodwyer100 @bryngemark
At this point I think we should bring the HLS submodule to be part of the ldmx-sw, and fix the VAL thing (I did it locally and it resolves it). The HLS repo was last touched 5 years ago, and it's not very big [funny thing that Shazhad from CMS did try to patch it a year ago -- the PR is still unmerged].
I also think it would have a natural place in Framework
or under Tools
, given that it's used both by Trigger
and TrigScint
It should be okay to initialize it with certain values. I didn't understand your most current comment. Did that not work?
Sorry now your second to most current. Edit: Ah I see. It doesn't like the ap_int<12> instantiation. I don't see why it wouldn't; Ill look at it but it should be allowed.
So something has been scrambled. I don't know which change did it, but I will need to inspect things before this gets commited. The track number distribution is very high; it is producing a high fake rate.
I will need to look event by event to see why this occurs.
Personally, I would rather keep a fork of HLS_arbitrary_precision and just redirect our submodule to our fork.
Similar to G4DarkBreM, I just think there is a possibility that this code would be used in other applications (LDMX or not) and so keeping it separate would be helpful.
rather keep a fork of HLS_arbitrary_precision
but it's such a small repo... having it as a submodule has the advantage that if it changes we can just update it whenever we think it's needed. But this did not and does not change even when it should (see the PR from Shahzad a year ago). And then for your argument about using in other applications: for outside ldmx I doubt people would take an ldmx-sw fork instead of having their own fork. And then for inside ldmx, isnt that an argument to have it in our software directly?
Yea.. you've convinced me
I'll put them in Tools and update the CMake
@rodwyer100 I am going to merge this even though it certainly contains the issues that you have observed just because there is a lot of code being introduced. I think your other PR https://github.com/LDMX-Software/ldmx-sw/pull/1473 can be a location for patches while you are introducing the hit stagger.
I figured out what happened. Its a small change thats needed which won't change what was done. I will include it in my hit branch alongside a rebased branch
replace this with a while loop instead to be more clear about the possibility of rounding errors affecting performance
For some reason, this only was shown on
trunk
after merging in unrelated (but in TrigScint?) code #1461 .Check List