Closed ariostas closed 5 months ago
This PR will also remove the sizes defined in Constants.h right? Now that these are no longer globals. https://github.com/SegmentLinking/TrackLooper/issues/380
This PR will also remove the sizes defined in Constants.h right? Now that these are no longer globals. https://github.com/SegmentLinking/TrackLooper/issues/380
Oh yeah, I wasn't planning on it, but we should do that on this PR too
Oh yeah, I wasn't planning on it, but we should do that on this PR too
That turned out to be much more convoluted than I anticipated. I'll try again once this PR is more complete, but I might also defer it to a future PR.
I think it should all be working now. I'll test it with the CI while I check if there's things that could be cleaned up a bit. Also, I'll check to see how much it needs to be changed to fix the issue @GNiendorf mentioned.
/run standalone /run cmssw 26
The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.
The full set of validation and comparison plots can be found here.
Here is a timing comparison:
Evt Hits MD LS T3 T5 pLS pT5 pT3 TC Reset Event Short Rate
avg 44.7 324.4 126.5 50.5 99.1 546.4 135.0 159.3 108.4 2.5 1596.9 1005.8+/- 262.6 433.1 explicit_cache[s=4] (master)
avg 45.0 325.1 125.4 49.6 97.6 506.2 134.4 158.9 99.7 2.4 1544.2 993.0+/- 259.2 420.0 explicit_cache[s=4] (this PR)
After thinking about it a bit more, I think that a way to make this cleaner is to have the load functions construct the shared pointers instead of only filling them, which would make #380 much easier to address. Also, I should probably move the LSTESData.h
header that I added in https://github.com/SegmentLinking/cmssw/pull/26 to here so we can pass around all this data more cleanly. I'll ask for opinions during the meeting today and I'll have it ready within the next couple of days.
The PR was built and ran successfully with CMSSW. Here are some plots.
The full set of validation and comparison plots can be found here.
This is PR is close to being ready for review. I did end up fixing #380, but it needed quite a few changes. I made things as const as possible. Also, I found out the out of bounds issue I mentioned above, which we should look into. I'll run the CI to make sure that things look fine. I still have to work on the CMSSW side.
/run standalone
Thanks @GNiendorf for pointing out that dxdy_slope_
was a map, not a vector.
/run standalone
The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.
The full set of validation and comparison plots can be found here.
Here is a timing comparison:
Evt Hits MD LS T3 T5 pLS pT5 pT3 TC Reset Event Short Rate
avg 44.5 324.2 123.5 47.4 95.7 547.0 133.4 156.6 105.0 2.0 1579.3 987.8+/- 261.4 429.7 explicit_cache[s=4] (master)
avg 44.8 323.7 123.1 47.8 95.5 550.1 133.3 156.4 105.5 1.7 1581.8 986.9+/- 262.8 430.9 explicit_cache[s=4] (this PR)
a few classes and methods will now appear as identical visible symbols in all sdl_ library backend builds. Is there some first principles justification that this is safe?
For now, should we keep using templated classes even for things that don't use any alpaka buffers or should we figure out how to move them into a non-alpaka part now? As far as I can tell, CMSSW only loads one library at a time, so there probably wouldn't be symbol conflicts, but I'm not sure if in more general scenarios there could be issues.
I think for now it doesn't matter if we have potential symbol conflicts. When we move to CMSSW (hopefully pretty soon) it will be a lot easier to separate non-alpaka parts by moving them out of the alpaka
directory.
/run standalone /run cmssw 26
The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.
The full set of validation and comparison plots can be found here.
Here is a timing comparison:
Evt Hits MD LS T3 T5 pLS pT5 pT3 TC Reset Event Short Rate
avg 44.4 324.2 126.0 49.5 97.5 542.9 135.2 157.5 104.8 2.8 1584.7 997.4+/- 260.2 435.6 explicit_cache[s=4] (master)
avg 44.7 325.1 124.6 51.1 100.1 498.1 134.6 164.1 102.8 2.7 1547.9 1005.1+/- 264.6 423.4 explicit_cache[s=4] (this PR)
/run standalone
@ariostas will the previous run cmssw actually complete or does the request/job get killed by the PR update?
will the previous run cmssw actually complete or does the request/job get killed by the PR update?
It will complete, but let's run a new one to make sure that the new changes don't break anything.
/run cmssw 26
The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.
The full set of validation and comparison plots can be found here.
Here is a timing comparison:
Evt Hits MD LS T3 T5 pLS pT5 pT3 TC Reset Event Short Rate
avg 44.7 322.7 122.6 47.4 95.2 549.5 133.3 154.4 104.2 1.4 1575.4 981.2+/- 261.8 430.3 explicit_cache[s=4] (master)
avg 43.7 322.5 122.2 48.2 98.5 505.2 133.4 156.1 101.4 2.5 1533.8 984.9+/- 262.8 420.5 explicit_cache[s=4] (this PR)
There seems to be some issue with CUDA. I'll look into it.
I moved some functions back into an unnamed namespace. I didn't realize that there was a reason for it. Now it also works with CUDA.
I ran a timing comparison on cgpu-1, but the pLS time is too small to notice a difference.
This PR
Total Timing Summary
Average time for map loading = 477.75 ms
Average time for input loading = 7560.01 ms
Average time for SDL::Event creation = 0.0114327 ms
Evt Hits MD LS T3 T5 pLS pT5 pT3 TC Reset Event Short Rate
avg 19.5 1.3 0.4 1.5 1.5 0.4 0.6 0.5 1.2 0.1 27.0 7.1+/- 1.4 29.2 explicit_cache[s=1]
avg 7.0 1.4 0.6 2.0 1.9 0.4 1.0 0.7 1.7 0.2 16.9 9.4+/- 2.1 9.8 explicit_cache[s=2]
avg 9.6 1.8 0.8 3.1 3.1 0.5 2.0 1.2 2.7 0.4 25.3 15.2+/- 3.2 7.0 explicit_cache[s=4]
avg 15.2 2.3 1.2 4.3 4.2 0.6 3.1 1.9 3.8 0.5 37.2 21.3+/- 4.0 6.7 explicit_cache[s=6]
avg 21.4 2.9 1.5 5.5 5.9 0.8 4.8 2.7 4.7 1.0 51.3 29.1+/- 6.8 6.8 explicit_cache[s=8]
master
Total Timing Summary
Average time for map loading = 445.219 ms
Average time for input loading = 7561.65 ms
Average time for SDL::Event creation = 0.0083333 ms
Evt Hits MD LS T3 T5 pLS pT5 pT3 TC Reset Event Short Rate
avg 20.4 1.3 0.4 1.5 1.5 0.4 0.6 0.5 1.2 0.1 27.9 7.1+/- 1.4 30.3 explicit_cache[s=1]
avg 6.8 1.4 0.5 2.0 1.9 0.4 1.0 0.7 1.7 0.2 16.6 9.4+/- 2.0 9.6 explicit_cache[s=2]
avg 9.6 1.8 0.8 3.1 3.0 0.5 2.0 1.2 2.6 0.4 25.0 14.9+/- 3.5 7.0 explicit_cache[s=4]
avg 15.0 2.3 1.2 4.3 4.4 0.6 3.2 1.9 3.8 0.5 37.3 21.6+/- 4.7 6.7 explicit_cache[s=6]
avg 21.5 3.0 1.5 5.7 5.7 0.8 4.5 2.5 5.0 0.9 51.0 28.7+/- 8.2 6.8 explicit_cache[s=8]
The PR was built and ran successfully with CMSSW. Here are some plots.
The full set of validation and comparison plots can be found here.
I also verified that it works with GPU in CMSSW, so it's all ready now.
This PR removes the remaining global variables. The standalone version works now, but I still need to work on the CMSSW side. I'll mark it as ready for review once everything works and there's a companion PR on the CMSSW repo.