Open cliffburdick opened 2 years ago
Requirements for this feature are:
This will also allow future dynamic creation of tensors by using the underlying allocator when a copy is performed.
Some API functions do asynchronous allocations as part of normal operation, but these do not need to call the custom allocator since it's purely managed internally by MatX and the caching layer.
At this time there are at least three popular allocators we want to support: RMM, thrust, and libcu++. libcu++ memory resource API has not been finalized at this time, so we are waiting to see how similar it would be to RMM. Either way, we will need some kind of wrapper or adapter due to the API differences.
- Both cases above must have stream and non-stream-oriented semantics
I assume this means users can pass in either a stream-ordered or non-ordered allocator? Then, if both are set which one should MatX pick? How should the allocator getter/setter APIs look like? (We can discuss this offline as I am also working on this for other projects 🙂)
this allocator will be used whenever a new tensor is created, and for simplicity some users won't want to specify a stream or even know what it is. the default allocator (cudaMalloc) is a good example of this. for more advanced users they would want a stream allocator bound to a tensor instance. this means that whenever a tensor is deep copied you'd expect that to happen in the specified stream, and when a tensor goes out of scope the free is stream-oriented.
we should definitely connect offline.
Since libcudacxx's implementation of the memory allocator has been deferred, it's worthwhile at this point to use RMM as our de facto memory allocator instead of the custom one we currently have. This would also give us more flexibility by using pool and arena allocators.
Is your feature request related to a problem? Please describe. MatX currently takes raw non-owned pointers, smart unowned pointers, and self-allocates owned pointers otherwise. While this allows for many different options, it does not allow users to have their own memory allocators MatX would call.
Describe the solution you'd like Allow functionality similar to xtensor
Describe alternatives you've considered Above
-