Closed makortel closed 2 years ago
@fwyzard This is the prototype I mentioned earlier (and apparently failed to open in a draft mode...).
Rebased on top of master
to fix conflicts in src/alpakatest/Makefile
.
Could this be extended to better handle multiple backends with the same memory space ?
Currently we define a backend with
In principle we should have different execution options for the same memory space: cpu sync vs tbb sync, cuda sync vs cuda async, etc.
Do you think the approach researched here could be used to have a single data product (both in terms of dataformat type, and of underlying memory buffer/soa) shared among different execution cases ?
One concrete example would be having the CPU serial implementation for every module, and the TBB (serial) only for some modules where the extra parallelism makes sense.
Could this be extended to better handle multiple backends with the same memory space ? ... Do you think the approach researched here could be used to have a single data product (both in terms of dataformat type, and of underlying memory buffer/soa) shared among different execution cases ?
I think this approach would allow such an extension. There would certainly be many details to be worked out (like how to make the framework enough aware of memory and execution spaces, including supporting multiple devices of the same type, but in a generic way). But I'd expect the user-facing interfaces would stay mostly the same.
I have also the CUDA managed memory / SYCL shared memory in mind (for platforms that have a truly unified memory), in which case it would be nice if the downstream, alpaka-independent consumers could use directly the data product wrapped in edm::Product
(as it is called here) after a proper synchronization. With edm::Product<T>
class template being part of the framework we could peek in there (like with edm::View
).
Of course, for any of this "using data products of one memory space in many backends" to work at all, the data product the EDProducer perceives to produce should be exactly the same type in all the backends for which this "sharing" is done (but IIUC you also wrote that).
For Serial/TBB backends using the same product types should, in principle, be trivial (and therefore the setup should be straightforward if the TBB backend uses a synchronous queue).
OK, so we are thinking about:
At lease for debugging, it might be useful to support also:
I'm starting to see why alpaka keeps the three concepts almost orthogonal...
Made effectively obsolete by https://github.com/cms-sw/cmssw/pull/39428
This PR prototypes the Alpaka EDModule API, taking inspiration from https://github.com/cms-patatrack/pixeltrack-standalone/pull/224 and https://github.com/cms-patatrack/pixeltrack-standalone/pull/256. A major tested idea was to see how far the system could be implemented with just forward-declared Alpaka device, queue, and event types in order to minimize the set of source files that need to be compiled with the device compiler (I first crafted this prototype before the
ALPAKA_HOST_ONLY
macro).The first commit extends the build rules by adding a new category of source files that need to be compiled for each Alpaka backend, but can be compiled with the host compiler. This functionality might be beneficial also on wider scope than this PR alone (so I could open a separate PR with only it). Here I took the approach of using a new file extension,
.acc
("a" for e.g. "accelerated"), for the files that need to be compiled with the device compiler. The.cc
files can be compiled with the host compiler. I'm not advocating for this particular choice as I'm not very fond of it, but I needed something to get on with the prototype.I don't think we should apply this PR as is, but identify the constructs that would be useful, and pick those (and improve the rest).
One idea here was to hide the
cms::alpakatools::Product<T>
from users (having to explicitly interact with theScopedContext
to get theT
is annoying). In addition, for CPU serial backend (synchronous, operates in regular host memory) theProduct<T>
wrapper is not used (because it is not really needed). In this way the downstream code could use the data products from Serial backend directly. For developers the setup would look likeedm::EDGetTokenT<T>
and produced withedm::EDPutTokenT<T>
(i.e. they look like normal products)edm::EDGetTokenT<edm::Host<T>>
and produced to withedm::EDPutTokenT<edm::Host<T>>
edm::Host<T>
is just a "tag", not an actual product type.Internally this setup works such that for CPU Serial backend the
edm::Host<...>
part is ignored, and for other backendsedm::EDGetTokenT<T>
is mapped toedm::EDGetTokenT<edm::Product<T>>
edm::EDGetTokenT<edm::Host<T>>
is mapped toedm::EDGetTokenT<T>
For this setup to work an
ALPAKA_ACCELERATOR_NAMESPACE::Event
class is defined to be used in the EDModules instead ofedm::Event
. It wraps theedm::Event
, and implements the aforementioned mapping logic (for getting and putting side) with a set of helper classes that are specialized for the backends. TheALPAKA_ACCELERATOR_NAMESPACE::EDProducer(ExternalWork)
class implements the (reverse) mapping logic for theconsumes()
andproduces()
side.The
cms::alpakatools::Product<TQueue, T>
is transformed intoedm::Product<T>
that can hold arbitrary metadata via type erasure (currentlystd::any
for demonstration purposes). For Alpaka EDModules aALPAKA_ACCELERATOR_NAMESPACE::ProductMetadata
class is defined for the metadata purpose. This class(es) took also some of the functionality ofScopedContext
that seems to work there better in this abstraction model (actually thekokkos
version has similar structure here).The
ScopedContext
class structure is completely reorganized, and is now completely hidden from the developers. There is now anALPAKA_ACCELERATOR_NAMESPACE::impl::FwkContextBase
base class for the common functionality between ED modules and ES modules (although the latter is not exercised in this prototype, so this is what I believe to be the common functionality). TheALPAKA_ACCELERATOR_NAMESPACE::EDContext
class derives theFwkContextBase
and adds ED specific functionality. I guess theFwkContextBase
andEDContext
could be implemented also as templates instead of placing them intoALPAKA_ACCELERATOR_NAMESPACE
(they are hidden from developers anyway).A third context class,
ALPAKA_ACCELERATOR_NAMESPACE::Context
, is defined to be given to the developers (viaEDModule::produce()
argument). It gives access to theQueue
object. Internally it also signals to theFwkContextBase
when theQueue
has been asked by the developer, so that if the EDModule accesses its input products for the first time after that point, it won't try to re-use theQueue
from the input product (because the initially assignedQueue
is already being used). ThisContext
class can be later extended e.g. along https://github.com/cms-patatrack/pixeltrack-standalone/pull/256.One additional piece that would reduce the number of places where the
edm::Host<T>
would appear in user code, but is not prototyped here, would be automating the (mainly device-to-host) transfers. As long as the typeT
can be arbitrary, framework needs to be told how to transfer that type between two memory spaces (e.g. something along a plugin factory for functions), but at least these transfers would not have to expressed in the configuration anymore.