Closed fwyzard closed 2 years ago
The cleanup could be split to a separate PR.
Done.
The
Both src/alpaka
vs src/alpakatest
changes should be split to a separate PR.src/alpaka
and src/alpakatest
are fully supported in this PR.
The alpaka library changes should be merged upstream.
All changes have been merged upstream (post 0.9.0)
@waredjeb @tonydp03 FYI
Now supports both alpaka
and alpakatest
.
The src/.../Makefile
changes now should be more robust, and all tests do build.
The only pending issue AFAICT is the integration upstream of https://github.com/alpaka-group/alpaka/pull/1685 .
As a side note, currently specifying more than one device (e.g. alpaka --serial --cuda
) runs both set of modules (the CPU-serial ones and the CUDA ones) on each event.
I think it would be useful to let the framework pick a single different "device" for each event. For example in round robin, or in round robin with a different number of slots per device type, etc.
I think it would be useful to let the framework pick a single different "device" for each event. For example in round robin, or in round robin with a different number of slots per device type, etc.
I agree this could be an interesting mode of operation to try out at some point. Maybe open an issue about it? I suspect it won't be straightforward, so figuring out a reasonable approach in the mock framework could take some time (and this is something that could be looked after the Alpaka integration into CMSSW has been finished, for the first round at least, right?).
I also thought it would be complicated, then I slept (not much) over it, and had an idea for a simple implementation this morning, and it turned out that a static assignment of the "event streams" to different backends is not too bad: https://github.com/fwyzard/pixeltrack-standalone/commit/a40e22221f663c54e61a9ee9384c2ed4b3cc2420 .
Each backend can now take an optional weight, and if more than one backend is specified the number of streams will be split among the backends roughly according to their weights:
./alpaka --maxEvents 10000 --numberOfStreams 16 --numberOfThreads 8 --serial 0.2 --cuda 0.5 --hip 0.3 --validation
Found 1 device:
- AMD Ryzen 9 5900X 12-Core Processor
Found 1 device:
- NVIDIA GeForce GTX 1080 Ti
Found 1 device:
- Radeon Pro WX 9100
Processing 10000 events, with 16 concurrent events (5 on rocm_async, 3 on serial_sync, 8 on cuda_async) and 8 threads.
CountValidator: all 2997 events passed validation
Average relative track difference 0.000920349 (all within tolerance)
Processed 10000 events in 9.212266e+00 seconds, throughput 1085.51 events/s, CPU usage per thread: 93.8%
(looks like the CountValidator
may need some update)
(looks like the
CountValidator
may need some update)
Actually the module itself is fine - the reason is that endJob()
is only called for "stream 0", and so only one of the different instantiations gets called.
Let
alpaka
andalpakatest
support serial, TBB, CUDA and ROCm at the same time.Each backend can take an optional weight, and if more than one backend is specified the number of streams will be split among the backends roughly according to their weights:
Note that the
CountValidator
prints only a partial count becauseendJob()
is only called for "stream 0", and so only one of the different instantiations ofALPAKA_ACCELERATOR_NAMESPACE::CountValidator
gets called.Static splitting of event streams across multiple backends: each event stream is associated to a different backend, according to the optional weight specified on the command line.
Split the compilation by backend:
ALPAKA_..._ENABLED
macros are only defined one at a time;ALPAKA_..._PRESENT
macros to identify all backends for which support is being compiled;Add forward declaration for alpaka templates and types (thanks to Matti for the idea). Add explicit instantiation definitions and declarations to the initialisation code, and move it to the AlpakaCore "portable" library. Use new pinned host memory functionality, introduced in the latest alpaka update.
Update alpaka to the
fwyzard/develop
private branch, pending integration upstream. Relevant changes include:Autogenerate
plugins.txt
from the content of the plugins' shared libraries.