alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
343 stars 71 forks source link

[RFC] move some of the CMS utilities built on top of Alpaka into Alpaka itself - or a separate library ? #1928

Open fwyzard opened 1 year ago

fwyzard commented 1 year ago

In CMS we developed various functionality that is general enough and might be interesting to share.

Some are caching solutions we developed to improve the efficiency of the original CUDA code, that were later adapted to work with Alpaka:

Some could be device code implemented in CUDA/Alpaka to perform generic computations, like a prefix scan, or containers like fixed-capacity vectors and maps.

I would be happy to contribute into Alpaka itself the components that would fit there.

For the others, I was thinking about starting a new library that would work on top of Alpaka. I even have a name for it: since the functionality was originally inspired by "cub", I would call it "cria" (A cria is a juvenile llama, alpaca, vicuña, or guanaco. Wikipedia, like a cub is the young of a lion, tiger, bear, etc.)

Ideas? Comments? Suggestions?

j-stephan commented 1 year ago

We wanted to have a specialized memory allocation library outside of alpaka for a while now. Does that fit what you are envisioning for cria? I like the name.

Regarding queues and events: I think they'd fit into mainline alpaka, but I'll hand these points over to @psychocoderHPC who has more experience in CUDA/HIP.

psychocoderHPC commented 1 year ago

For this issue, I would answer with what I wrote in the other issue https://github.com/alpaka-group/alpaka/issues/1927#issuecomment-1462245172 Currently, we have a hard time deciding what should be part of alpaka and what should be in an external library.

Both ways have advantages and disadvantages for packaging, usage, testing, and maintainability. We should discuss this behavior in the corresponding tickets to find pros and contra but I think we can only cam to a solution in a VC. I definitively do not like to decide on my own because OpenSource development is a democracy.

bernhardmgruber commented 1 year ago

For the purpose of easy accessibility, I generally favor integrating functionality into alpaka, even if it is technically built on top of core functionality. A decisive reason for me is usually whether the new functionality introduces additional major dependencies, unit tests or compile times. For all the suggestions you have made, I do not see a blocker in this regard.

Caches for queues and events seem like a small addition. I imagine those to be pooled objects for reuse. So basically a std::vector<alpaka::Event>, created with a certain size, from which we can get (recycled) events and put used ones back. If you don't intend to add a global pool, but just the class implementing the functionality, then I am happy to see this added to alpaka. Maybe just in a separate directory under include/alpaka.

Caches for buffers is a harder case for me, since I don't fully understand whether you recycle alpaka buffers, or just allocate a big buffer and implement a custom heap manager. If it's the latter, than this feature is like mallocMC, which is currently also a separate library.

Additional algorithms are in principle planned for the vikunja library. However, as already suggested, I would personally prefer to have vikunja (which is basically a handfull of files) just be a part of alpaka, for the sake of easier handling on the user side.

For additional data structures, there is potential overlap with LLAMA, although I would not put specific data structures into LLAMA. LLAMA is about generic data layouts. So I don't know were I would put "containers like fixed-capacity vectors and maps". I guess it depends a bit on how specialized they are.

Finally, I don't care that much what we decide. I am super happy @fwyzard is contributing and I think there is great potential for reuse by other alpaka users.

My suggestion to move forward is to have a look at the pieces of actual code that @fwyzard wants to contribute and then decide.

fwyzard commented 1 year ago

Caches for queues and events seem like a small addition. I imagine those to be pooled objects for reuse. So basically a std::vector<alpaka::Event>, created with a certain size, from which we can get (recycled) events and put used ones back. If you don't intend to add a global pool, but just the class implementing the functionality, then I am happy to see this added to alpaka. Maybe just in a separate directory under include/alpaka.

Unfortunately an std::vector is not a thread-safe data structure; so for the caching of Queues and Events we rely on TBB concurrent data structures to provide thread-safety. This adds an extra dependency with respect to Alpaka, where TBB is just an optional backend.

Boost kind of provides a lock-free concurrent queue, but it's mostly undocumented, so I'm not particularly eager to use it (even though I tested it a couple of years ago and it seemed to work).

Caches for buffers is a harder case for me, since I don't fully understand whether you recycle alpaka buffers, or just allocate a big buffer and implement a custom heap manager. If it's the latter, than this feature is like mallocMC, which is currently also a separate library.

We cache the Alpaka buffers.

fwyzard commented 1 year ago

My suggestion to move forward is to have a look at the pieces of actual code that @fwyzard wants to contribute and then decide.

Sure!

Not sure if I can join next week, but I should be able to join on the 21st.

j-stephan commented 1 year ago

Currently, we have a hard time deciding what should be part of alpaka and what should be in an external library.

My two cents: