Closed krasznaa closed 1 month ago
As mentioned, please increase the minimum vecmem version to 1.8.0.
Hmm. are you going to go through this for every EDM class?
This will also have lots of conflictions with #692 :crying_cat_face:
This will also have lots of conflictions with #692 😿
The good news is that when the proxy objects are implemented, most of the code will remain unchanged.
This will also have lots of conflictions with #692 😿
The good news is that when the proxy objects are implemented, most of the code will remain unchanged.
Some further developments are indeed underway...
All of you, hold onto your hats. :smile: If/once we settle on https://github.com/acts-project/vecmem/pull/296, these are the types of updates that we will need to do to switch from the current AoS to a new SoA EDM:
At the same time I looked at profiles of the throughput application a little as well. This was very educational. As it turns out, the small slowdown is not due to the kernels. It seems to be due to the code spending a little more time on memory copies. :thinking:
That's not great news, as apparently the vecmem::edm
code is not quite as efficient wrt. CPU usage as I hoped. But at least the SoA layout doesn't seem to have much of an impact on clusterization after all. (Remember, even with the current AoS layout, since traccc::cell
is tiny, the memory access pattern of clusterization is pretty efficient already.)
The good news is that once the code finally starts working on all platforms with all compilers, this very latest version is finally delivering on the performance front. :smile:
[bash][Legolas]:traccc > ./out/build/cuda/bin/traccc_throughput_st_cuda --input-directory=tml_full/ttbar_mu200/ --input-events=10 --cold-run-events=100 --processed-events=1000
Running Single-threaded CUDA GPU throughput tests
>>> Detector Options <<<
Detector file : tml_detector/trackml-detector.csv
Material file :
Surface grid file :
Use detray::detector: no
Digitization file : tml_detector/default-geometric-config-generic.json
>>> Input Data Options <<<
Input data format : csv
Input directory : tml_full/ttbar_mu200/
Number of input events : 10
Number of input events to skip: 0
>>> Clusterization Options <<<
Threads per partition: 256
Target cells per thread: 8
Max cells per thread: 16
Scratch space size mult.: 256
>>> Track Seeding Options <<<
None
>>> Track Finding Options <<<
Max number of branches per seed: 10
Max number of branches per surface: 10
Track candidates range : 3:100
Minimum step length for the next surface: 0.5 [mm]
Maximum step counts for the next surface: 100
Maximum Chi2 : 30
Maximum branches per step: 10
Maximum number of skipped steps per candidates: 3
PDG Number: 13
>>> Track Propagation Options <<<
Navigation
----------------------------
Min. mask tolerance : 1e-05 [mm]
Max. mask tolerance : 1 [mm]
Mask tolerance scalor : 0.05
Path tolerance : 1 [um]
Overstep tolerance : -100 [um]
Search window : 0 x 0
Parameter Transport
----------------------------
Min. Stepsize : 0.0001 [mm]
Runge-Kutta tolerance : 0.0001 [mm]
Max. step updates : 10000
Stepsize constraint : 3.40282e+38 [mm]
Path limit : 5 [m]
Use Bethe energy loss : true
Do cov. transport : true
Use eloss gradient : false
Use B-field gradient : false
>>> Throughput Measurement Options <<<
Cold run event(s) : 100
Processed event(s): 1000
Log file :
WARNING: @traccc::io::csv::read_cells: 251 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000000-cells.csv
WARNING: @traccc::io::csv::read_cells: 305 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000001-cells.csv
WARNING: @traccc::io::csv::read_cells: 176 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000002-cells.csv
WARNING: @traccc::io::csv::read_cells: 200 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000003-cells.csv
WARNING: @traccc::io::csv::read_cells: 224 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000004-cells.csv
WARNING: @traccc::io::csv::read_cells: 170 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000005-cells.csv
WARNING: @traccc::io::csv::read_cells: 321 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000006-cells.csv
WARNING: @traccc::io::csv::read_cells: 322 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000007-cells.csv
WARNING: @traccc::io::csv::read_cells: 222 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000008-cells.csv
WARNING: @traccc::io::csv::read_cells: 118 duplicate cells found in /data/ssd-1tb/projects/traccc/traccc/data/tml_full/ttbar_mu200/event000000009-cells.csv
Using CUDA device: NVIDIA GeForce RTX 3080 [id: 0, bus: 1, device: 0]
Reconstructed track parameters: 0
Time totals:
File reading 4968 ms
Warm-up processing 358 ms
Event processing 2715 ms
Throughput:
Warm-up processing 3.58551 ms/event, 278.9 events/s
Event processing 2.71537 ms/event, 368.274 events/s
[bash][Legolas]:traccc >
Though I am a little bit afraid of this possibly being artificial. Since the previous result was on x86_64-ubuntu2204-gcc11-opt
, and these latest numbers are now on x86_64-ubuntu2404-gcc13-opt
. (I upgraded my home PC during the weekend... :stuck_out_tongue:) Still, at least the hardware is still the same... :thinking:
Quality Gate failed
Failed conditions 2 New Bugs (required ≤ 0) C Reliability Rating on New Code (required ≥ A)
See analysis details on SonarCloud
Catch issues before they fail your Quality Gate with our IDE extension SonarLint
Huhh... :thinking: What's your take on these errors @stephenswat?
Huhh... 🤔 What's your take on these errors @stephenswat?
SonarCloud actually makes a really valid point here about constraining universal references; I'd suggest we go ahead and implement them.
Huhh... 🤔 What's your take on these errors @stephenswat?
SonarCloud actually makes a really valid point here about constraining universal references; I'd suggest we go ahead and implement them.
As long as you have a concrete idea of how to go about it, I'm happy to let you propose the improvement. :wink:
Okay, I guess we need to get vecmem 1.10.0 and then we can put this in, right?
Failed conditions
C Reliability Rating on New Code (required ≥ A)
2 New Bugs (required ≤ 0)
See analysis details on SonarCloud
Catch issues before they fail your Quality Gate with our IDE extension SonarLint
This is the next monster PR... Exchanging
traccc::cell_collection_types
andtraccc::cluster_container_types
with SoA versions.To jump right to the chase: It doesn't bring any performance improvement. :frowning: This EDM change of course only affects clusterization. Which is already one of the fastest things that we run. Still if anything, I see an O(1%) performance drop during the TML $\mu$=200 throughput measurements with this update applied. :thinking:
On my RTX3080 I get the following with the current
main
branch:While this PR produces the following:
(There is variation on these numbers, but the "new" code is always just a little slower. :frowning:)
About the code:
traccc::edm::silicon_cell_collection
andtraccc::edm::silicon_cluster_collection
as the name of these containers. But I'm not too fond of these names either. So I'm open to suggestions.traccc::cell
closely.traccc::edm::silicon_cluster_collection
type, only used in the host code, is now a jagged vector of cell indices. As such,traccc::host::measurement_creation_algorithm
had to change its interface slightly.We'll have to do some profiling, but I suspect that the small performance drop comes from the fact that the PR's code always reads the cell data from global memory, whenever it needs it. Just loading some of the info into local registers in a couple of places will hopefully take us back to the previous performance. I just didn't want to complicate the code even further in this PR. :thinking:
This PR also closes #691.