Closed fwyzard closed 3 years ago
No impact on the computing performance running the pixel-only profiling workflow.
Before:
Running 3 times over 10100 events with 2 jobs, each with 10 threads, 10 streams and 1 GPUs
1339.5 ± 0.3 ev/s (10000 events, 99.4% overlap)
1334.1 ± 0.3 ev/s (10000 events, 99.8% overlap)
1328.4 ± 0.3 ev/s (10000 events, 99.1% overlap)
--------------------
1334.0 ± 5.6 ev/s
After:
Running 3 times over 10100 events with 2 jobs, each with 10 threads, 10 streams and 1 GPUs
1342.0 ± 0.3 ev/s (10000 events, 99.6% overlap)
1338.0 ± 0.3 ev/s (10000 events, 99.5% overlap)
1332.8 ± 0.3 ev/s (10000 events, 99.6% overlap)
--------------------
1337.6 ± 4.6 ev/s
Use
std::clamp(...)
in device code now that CUDA supports c++17. Name reused constants in the vertex fitting and splitting.