Apologies, this seems to have become rather long. It seemed like such a simple idea at the start!
There are four main tracts to this idea:
Allow DeviceBox<[T]>, making DeviceBuffer just an alias that could be deprecated in future.
Make the interface safer by using MaybeUninit for uninitialized/zeroed allocations on the device.
Add an Alloc generic parameter to DeviceBox, allowing for various new type of allocation.
Bonus: Add support for async allocations.
I think this proposal is entirely backwards compatible, though it does introduce some methods that are very similar to existing, e.g. new_unit vs. uninitialized, new_zeroed vs. zeroed.
DeviceAllocator
// new `alloc` module
pub trait DeviceAllocator {
type Ptr;
fn allocate(&self, size: usize) -> CudaResult<Ptr>;
// This allows for asynchronous zeroing.
fn allocate_zeroed(&self, size: usize) -> CudaResult<Ptr>;
fn deallocate(&self, ptr: Ptr) -> CudaResult<()>;
}
// Uses `cudaMalloc`, `cudaFree`.
pub struct Global; // TODO better name?
impl DeviceAllocator for Global {
type Ptr = DevicePointer<u8>;
...
}
// Other allocators might include:
// `Unified`, `HostPinned`, `Pitched`, `Async`, `MemoryPool`, etc.
pub struct DeviceBox<T, A: DeviceAllocator = Global> {
ptr: A::Ptr,
alloc: A,
}
impl<T, A> DeviceBox<T, A> {
pub fn new_in(x: T, alloc: A) -> DeviceBox<T, A>;
}
MaybeUninit
impl<T> DeviceBox<T, Global> {
...
// Note that these methods are safe.
pub fn new_uninit() -> DeviceBox<MaybeUninit<T>, Global>;
pub fn new_zeroed() -> DeviceBox<MaybeUninit<T>, Global>;
}
impl<T, A> DeviceBox<T, A> {
...
pub fn new_uninit_in(alloc: A) -> DeviceBox<MaybeUninit<T>, A>;
pub fn new_zeroed_in(alloc: A) -> DeviceBox<MaybeUninit<T>, A>;
}
impl<T, A> DeviceBox<MaybeUninit<T>, A> {
pub unsafe fn assume_init(self) -> DeviceBox<T, A>;
// Use this for kernel outputs, then `assume_init` after the kernel is complete.
pub unsafe fn as_uninit_device_pointer(&mut self) -> DevicePointer<T>;
}
Apologies, this seems to have become rather long. It seemed like such a simple idea at the start!
There are four main tracts to this idea:
DeviceBox<[T]>
, makingDeviceBuffer
just an alias that could be deprecated in future.MaybeUninit
for uninitialized/zeroed allocations on the device.Alloc
generic parameter toDeviceBox
, allowing for various new type of allocation.I think this proposal is entirely backwards compatible, though it does introduce some methods that are very similar to existing, e.g.
new_unit
vs.uninitialized
,new_zeroed
vs.zeroed
.DeviceAllocator
MaybeUninit
DeviceBox<[T]>
Async