Open psychocoderHPC opened 4 years ago
Thanks a lot for the inspiring concept work. A more simple frontend would help not only the newcomers. It just makes coding more productive and readable. Code examples also would fit on a few slides then, hopefully ;-)
Alpaka buffer: Perhaps it is sufficient to pass the buffer instead of the iterator, because the buffer already has all information. But I see the idea, that you can manage the accelerator-dependent access methods by that. Not sure, how the kernel interface shall look like, here is an example how SYCL deals with it:
buffer mybuffer_d(mybuffer_h);
queue myQueue;
command_group(myQueue, [&]()
{
// Data accessors
auto a = mybuffer_d.get_access<access::read>();
// Kernel
parallel_for(count, kernel_functor([ = ](id<> item) {
int i = item.get_global(0);
// ... do something with a[i]
For the record, SYCL is fully C++-compliant and by a C++ library you could map SYCL code to an OpenMP backend with an ordinary C++ compiler (IIRC codeplay had an OpenMP version, but at a very very early stage). The question is, if we can simply move the relevant runtime polymorphic to compile-time one and how the API will look then.
(Btw, besides of the integration of the buffer concept, we also have to consider multiple platform levels because the backend can be heterogeneous itself. This means, that you not only traverse the devices, but also the platforms like OpenCL and SYCL do. This is a separate issue though.)
First, the accelerator and memory objects/types should be tackled I guess. If impliciteness becomes added, then the question is whether the more simple API should be a separate layer on-top of Alpaka instead of refactoring existing code only. A separate layer would also simplify legacy as long as possible, but creates more code. There are changes like buffer or kernel interface that involve refactoring of the Alpaka core though.
We probably have to evolve through multiple designs, but I would love to see this alive, because especially with the C++17 features it should be doable that Alpaka becomes a modern, productive interface like SYCL (or even better and more performant).
So how you would like to proceed? I guess, first we need the whole design and its pitfalls, before we can implement the actual thing.
My 2 cents on this topic.
My understanding is that Alpaka deliberately introduces API as free functions, and not in the object-oriented style, so that e.g. putting a task to a queue looks like simple::queue::enqueue(queue, kernel);
and not like queue.enqueue(kernel);
(using the simple
API from this issue, but same for existing Alpaka). The free function-style API makes it harder to define the Queue
interface for a user, and involves simply more typing and errors with wrong namespaces used, as compared to the object-oriented one. I think the function style has a potential advantage of substituting default parameters better. Theoretically, there may be a default queue for each device, CUDA-style, and if the queue is not specified, a default one will be used. This would make it easier for new users and simple examples, where only 1 queue is needed. However, currently Alpaka does not utilize it at all, and I guess it does not fit to its explicit specify-each-detail style of interfaces. I assume there were discussions on that during the original development, that I am simply not aware of, and the matter is more complicated than I just wrote, just sharing my thoughts.
If (big if) we are basing this on the SYCL API, is there any reason to not just turn alpaka into an actual implementation of SYCL? Minus OpenCL interoperability because that would require a working OpenCL implementation on the executing system.
We discussed this before in an issue and it's definitely possible, just a matter of priorities of the project and available resources.
On simplifying namespaces in alpaka: https://github.com/alpaka-group/alpaka/issues/1034
On the Idx
type: https://github.com/alpaka-group/alpaka/issues/1035
+1 for using object-oriented style. Especially:
queue.enqueue(kernel);
device.allocate<T>(100);
This morning I wasted an hour trying to write a class method that creates a kernel and enqueues it.
classInstance.runKernel(queue)
can't be defined, because the classInstance
needs to know Acc
and not just alpaka::Dev<Acc>
. Apparently, there is no way to get Acc
or Dev
from alpaka::Queue
queue, since alpaka::Queue
doesn't exist. Honestly, why not define alpaka::Queue
for each accelerator by template specialization? Then at least type matching on the function argument would work. The examples don't document the expected idioms very well.
I agree with the general sentiment. And judging by this topic I guess most of alpaka contributors do.
I think the main difference is not between queue.enqueue(kernel)
vs enqueue(queue, kernel)
style. If Queue
was a concrete type (not a template taking Acc
and property types), those would be not that different. Or in the sense of the Interface principle would be both part of that imaginary class Queue
.
I feel a larger issue is that there is no Queue
concrete class. It is a kinda concept, but not really, and we are in C++14. And most alpaka's abstraction classes are in this state. And so since almost nothing in alpaka is a concrete class, alpaka pushes user's code interacting with it to take one alpaka type as a template parameter (e.g. Acc
or Queue
) and derive the rest from it when necessary. Same as our examples start with Acc
type definition and then derive the rest from there.
I believe there is a way to convert between Acc
, Dev
and Queue
types via existing traits. Dev / DevType should work fine for Queue
types as input. Acc / AccType gives Acc
type for given Dev
type. I think if some combinaiton does not work, that would be a bug, but not a lack of support in principle.
With C++20 concepts on the horizon I'm now slightly in favour of the current API design. Once those can be used in alpaka I believe we can remove a lot of internal code without hurting alpaka's feature set.
Motivation
Alpaka gives users high flexibility and freedom for their implementations. Alpaka is explicit everywhere and can therefore be controlled in fine granularity. For IMO 90% of the users, simple usage - at least to get started with a library - is very important.
Proposal
Alpaka is from the view of a user very hard to use, therefore I like to do the first step and propose an example of our
vector add
written against a pseudo APIsimple
.Example
Additional option
Even if I show here a self assembled interface based of the current usage of alpaka we should think about of deriving an interface based on the SYCL standard. I also thought lot of creating a SYCL frondend with alpaka as backend, but the SYCL API based a lot of runtime polymorphic (which is maybe removed by the SYCL compiler).
CC-ing: @ComputationalRadiationPhysics/alpaka-developers @ComputationalRadiationPhysics/alpaka-maintainers