[EPIC] Roadmap for cuda/memory_resource

jrhemstad commented 6 months ago

cuda::mr is intended to be the future of heterogenous memory allocation in CUDA C++. It is inspired heavily by lessons learned in RMM and our experience with the device_memory_resource* and friends. cuda::mr does not seek to replace RMM, but instead distill and standardize the best parts of RMM into a more central location. Furthermore, RMM is already in the process of rebasing on top of using the cuda::mr interface.

What we have today is the cuda/memory_resource header that provides

[async_]resource concepts
Property system
[async_]resource_ref polymorphic type

In essence, this just provides the top-level interface for memory allocation and defining properties of the allocated memory.

### Implementation plan
- [ ] https://github.com/NVIDIA/cccl/issues/2128
- [ ] https://github.com/NVIDIA/cccl/issues/2129
- [ ] https://github.com/NVIDIA/cccl/issues/2130
- [ ] https://github.com/NVIDIA/cccl/issues/2131
- [ ] https://github.com/NVIDIA/cccl/issues/2143
- [ ] https://github.com/NVIDIA/cccl/issues/2132
- [ ] Concrete types that satisfy the C++ allocator requirements
- [ ] Add NVTX annotations to all memory resources

### Misc
- [ ] Add NVTX annotations to all memory resources
- [ ] https://github.com/NVIDIA/cccl/issues/2313

### Concrete types that satisfy the C++ allocator requirements
- [ ] A `cuda::mr::allocator<T, Properties...>` capable of preserving concrete type of the resource (no type-erasure)
- [ ] A `cuda::mr::polymorphic_allocator<T, Properties...>` constructible from a `resource_ref<Properties...>`

Questions we'll need to answer along the way:

What lifetime semantics do we want to use for resources + allocators + data structures?
- RMM took a very relaxed approach of using non-owning references everywhere, but this is worth reconsidering (see https://github.com/rapidsai/rmm/issues/1492
Do all data structures only take Allocators? Or just resources? Both?
- In RMM, we took an approach of only constructing from resource_refs directly, but this was mostly for expediency and convenience, so it is worth reconsidering.

miscco commented 6 months ago

@harrism You might be interested in this

vyasr commented 6 months ago

Is there a long-term plan to pull more of the concrete implementations from rmm into CCCL? That seems like the best way to broaden adoption and usage of these allocators and would satisfy some of the new features mentioned above IIRC.

miscco commented 6 months ago

@vyasr yes I believe we want to pull some of the foundational features into cccl. Definitely not all but some

jrhemstad commented 6 months ago

Is there a long-term plan to pull more of the concrete implementations from rmm into CCCL? That seems like the best way to broaden adoption and usage of these allocators and would satisfy some of the new features mentioned above IIRC.

Yes, that is what we mean by "Concrete types that satisfy the resource and async_resource concepts"

fbusato commented 6 months ago

our RFE:

deallocate/deallocate_async functions should accept const void* to skip const_cast<T*>() on the user side
Allow cuda::mr::* functions in device code
Clarify (or fix) the expected behavior of allocate() deallocate() for async_resource
- Personal thought: remove _async() version of the API and add the stream to allocate/deallocate

jrhemstad commented 6 months ago

Clarify (or fix) the expected behavior of allocate() deallocate() for async_resource

Can you elaborate on what you mean? allocate() and deallocate() are expected to always be synchronous.

fbusato commented 6 months ago

Can you elaborate on what you mean? allocate() and deallocate() are expected to always be synchronous.

Yes, but what is their purpose if the code uses an async_resource with _async() API. They look redundant and confusing in this case

jrhemstad commented 6 months ago

Yes, but what is their purpose if the code uses an async_resource with _async() API. They look redundant and confusing in this case

The thinking is that the async_resource concept is a strict superset of the resource concept. This way, if you have an async_resource object, you can still conveniently pass it to a function that expects a resource.

fbusato commented 6 months ago

ok, I didn't interpret async_resource as a superset of the resource concept. In this case, can we please just clarify this point on the doc?

NVIDIA / cccl

[EPIC] Roadmap for cuda/memory_resource #1502