NVIDIA / MatX

An efficient C++17 GPU numerical computing library with Python-like syntax
https://nvidia.github.io/MatX
BSD 3-Clause "New" or "Revised" License
1.2k stars 83 forks source link

[FEA] Add custom allocator interface #48

Open cliffburdick opened 2 years ago

cliffburdick commented 2 years ago

Is your feature request related to a problem? Please describe. MatX currently takes raw non-owned pointers, smart unowned pointers, and self-allocates owned pointers otherwise. While this allows for many different options, it does not allow users to have their own memory allocators MatX would call.

Describe the solution you'd like Allow functionality similar to xtensor

Describe alternatives you've considered Above

-

cliffburdick commented 2 years ago

Requirements for this feature are:

  1. Users can pass in a raw pointer that has been allocated, along with a custom allocator, and the allocator will call the deallocate function when the tensor goes out of scope.
  2. Users can pass in a custom allocator and allocation and deallocation would be performed by MatX with that allocator
  3. Both cases above must have stream and non-stream-oriented semantics

This will also allow future dynamic creation of tensors by using the underlying allocator when a copy is performed.

Some API functions do asynchronous allocations as part of normal operation, but these do not need to call the custom allocator since it's purely managed internally by MatX and the caching layer.

At this time there are at least three popular allocators we want to support: RMM, thrust, and libcu++. libcu++ memory resource API has not been finalized at this time, so we are waiting to see how similar it would be to RMM. Either way, we will need some kind of wrapper or adapter due to the API differences.

leofang commented 2 years ago
  1. Both cases above must have stream and non-stream-oriented semantics

I assume this means users can pass in either a stream-ordered or non-ordered allocator? Then, if both are set which one should MatX pick? How should the allocator getter/setter APIs look like? (We can discuss this offline as I am also working on this for other projects 🙂)

cliffburdick commented 2 years ago

this allocator will be used whenever a new tensor is created, and for simplicity some users won't want to specify a stream or even know what it is. the default allocator (cudaMalloc) is a good example of this. for more advanced users they would want a stream allocator bound to a tensor instance. this means that whenever a tensor is deep copied you'd expect that to happen in the specified stream, and when a tensor goes out of scope the free is stream-oriented.

we should definitely connect offline.

cliffburdick commented 1 year ago

Since libcudacxx's implementation of the memory allocator has been deferred, it's worthwhile at this point to use RMM as our de facto memory allocator instead of the custom one we currently have. This would also give us more flexibility by using pool and arena allocators.