Remove global variables

ariostas commented 6 months ago

This PR removes the remaining global variables. The standalone version works now, but I still need to work on the CMSSW side. I'll mark it as ready for review once everything works and there's a companion PR on the CMSSW repo.

GNiendorf commented 6 months ago

This PR will also remove the sizes defined in Constants.h right? Now that these are no longer globals. https://github.com/SegmentLinking/TrackLooper/issues/380

ariostas commented 6 months ago

This PR will also remove the sizes defined in Constants.h right? Now that these are no longer globals. https://github.com/SegmentLinking/TrackLooper/issues/380

Oh yeah, I wasn't planning on it, but we should do that on this PR too

ariostas commented 6 months ago

Oh yeah, I wasn't planning on it, but we should do that on this PR too

That turned out to be much more convoluted than I anticipated. I'll try again once this PR is more complete, but I might also defer it to a future PR.

ariostas commented 6 months ago

I think it should all be working now. I'll test it with the CI while I check if there's things that could be cleaned up a bit. Also, I'll check to see how much it needs to be changed to fix the issue @GNiendorf mentioned.

/run standalone /run cmssw 26

github-actions[bot] commented 6 months ago

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     44.7    324.4    126.5     50.5     99.1    546.4    135.0    159.3    108.4      2.5    1596.9    1005.8+/- 262.6     433.1   explicit_cache[s=4] (master)
   avg     45.0    325.1    125.4     49.6     97.6    506.2    134.4    158.9     99.7      2.4    1544.2     993.0+/- 259.2     420.0   explicit_cache[s=4] (this PR)

ariostas commented 6 months ago

After thinking about it a bit more, I think that a way to make this cleaner is to have the load functions construct the shared pointers instead of only filling them, which would make #380 much easier to address. Also, I should probably move the LSTESData.h header that I added in https://github.com/SegmentLinking/cmssw/pull/26 to here so we can pass around all this data more cleanly. I'll ask for opinions during the meeting today and I'll have it ready within the next couple of days.

github-actions[bot] commented 6 months ago

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

ariostas commented 6 months ago

This is PR is close to being ready for review. I did end up fixing #380, but it needed quite a few changes. I made things as const as possible. Also, I found out the out of bounds issue I mentioned above, which we should look into. I'll run the CI to make sure that things look fine. I still have to work on the CMSSW side.

/run standalone

ariostas commented 6 months ago

Thanks @GNiendorf for pointing out that dxdy_slope_ was a map, not a vector.

/run standalone

github-actions[bot] commented 6 months ago

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     44.5    324.2    123.5     47.4     95.7    547.0    133.4    156.6    105.0      2.0    1579.3     987.8+/- 261.4     429.7   explicit_cache[s=4] (master)
   avg     44.8    323.7    123.1     47.8     95.5    550.1    133.3    156.4    105.5      1.7    1581.8     986.9+/- 262.8     430.9   explicit_cache[s=4] (this PR)

ariostas commented 5 months ago

a few classes and methods will now appear as identical visible symbols in all sdl_ library backend builds. Is there some first principles justification that this is safe?

For now, should we keep using templated classes even for things that don't use any alpaka buffers or should we figure out how to move them into a non-alpaka part now? As far as I can tell, CMSSW only loads one library at a time, so there probably wouldn't be symbol conflicts, but I'm not sure if in more general scenarios there could be issues.

ariostas commented 5 months ago

I think for now it doesn't matter if we have potential symbol conflicts. When we move to CMSSW (hopefully pretty soon) it will be a lot easier to separate non-alpaka parts by moving them out of the alpaka directory.

slava77 commented 5 months ago

/run standalone /run cmssw 26

github-actions[bot] commented 5 months ago

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     44.4    324.2    126.0     49.5     97.5    542.9    135.2    157.5    104.8      2.8    1584.7     997.4+/- 260.2     435.6   explicit_cache[s=4] (master)
   avg     44.7    325.1    124.6     51.1    100.1    498.1    134.6    164.1    102.8      2.7    1547.9    1005.1+/- 264.6     423.4   explicit_cache[s=4] (this PR)

slava77 commented 5 months ago

/run standalone

@ariostas will the previous run cmssw actually complete or does the request/job get killed by the PR update?

ariostas commented 5 months ago

will the previous run cmssw actually complete or does the request/job get killed by the PR update?

It will complete, but let's run a new one to make sure that the new changes don't break anything.

/run cmssw 26

slava77 commented 5 months ago

Here is a timing comparison:

pLS column changed from 543 to 498 (a similar reduction as in a test on May 14 , but not on May 17); I'm curious if a rerun (after the more technical const update) will confirm the reduction.

github-actions[bot] commented 5 months ago

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     44.7    322.7    122.6     47.4     95.2    549.5    133.3    154.4    104.2      1.4    1575.4     981.2+/- 261.8     430.3   explicit_cache[s=4] (master)
   avg     43.7    322.5    122.2     48.2     98.5    505.2    133.4    156.1    101.4      2.5    1533.8     984.9+/- 262.8     420.5   explicit_cache[s=4] (this PR)

ariostas commented 5 months ago

There seems to be some issue with CUDA. I'll look into it.

ariostas commented 5 months ago

I moved some functions back into an unnamed namespace. I didn't realize that there was a reason for it. Now it also works with CUDA.

I ran a timing comparison on cgpu-1, but the pLS time is too small to notice a difference.

This PR
Total Timing Summary
Average time for map loading = 477.75 ms
Average time for input loading = 7560.01 ms
Average time for SDL::Event creation = 0.0114327 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     19.5      1.3      0.4      1.5      1.5      0.4      0.6      0.5      1.2      0.1      27.0       7.1+/-  1.4      29.2   explicit_cache[s=1]
   avg      7.0      1.4      0.6      2.0      1.9      0.4      1.0      0.7      1.7      0.2      16.9       9.4+/-  2.1       9.8   explicit_cache[s=2]
   avg      9.6      1.8      0.8      3.1      3.1      0.5      2.0      1.2      2.7      0.4      25.3      15.2+/-  3.2       7.0   explicit_cache[s=4]
   avg     15.2      2.3      1.2      4.3      4.2      0.6      3.1      1.9      3.8      0.5      37.2      21.3+/-  4.0       6.7   explicit_cache[s=6]
   avg     21.4      2.9      1.5      5.5      5.9      0.8      4.8      2.7      4.7      1.0      51.3      29.1+/-  6.8       6.8   explicit_cache[s=8]

master
Total Timing Summary
Average time for map loading = 445.219 ms
Average time for input loading = 7561.65 ms
Average time for SDL::Event creation = 0.0083333 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     20.4      1.3      0.4      1.5      1.5      0.4      0.6      0.5      1.2      0.1      27.9       7.1+/-  1.4      30.3   explicit_cache[s=1]
   avg      6.8      1.4      0.5      2.0      1.9      0.4      1.0      0.7      1.7      0.2      16.6       9.4+/-  2.0       9.6   explicit_cache[s=2]
   avg      9.6      1.8      0.8      3.1      3.0      0.5      2.0      1.2      2.6      0.4      25.0      14.9+/-  3.5       7.0   explicit_cache[s=4]
   avg     15.0      2.3      1.2      4.3      4.4      0.6      3.2      1.9      3.8      0.5      37.3      21.6+/-  4.7       6.7   explicit_cache[s=6]
   avg     21.5      3.0      1.5      5.7      5.7      0.8      4.5      2.5      5.0      0.9      51.0      28.7+/-  8.2       6.8   explicit_cache[s=8]

github-actions[bot] commented 5 months ago

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

ariostas commented 5 months ago

I also verified that it works with GPU in CMSSW, so it's all ready now.

SegmentLinking / TrackLooper

Remove global variables #401