Unfortunately, allocating from a region on the CPU requires moving the region to the CPU. Sometimes this means copying an entire region back just to increment a pointer, and then sending the region immediately back to the GPU for a kernel. We could instead add a couple of small kernels to do the increment on the GPU, saving some large transfers.
Unfortunately, allocating from a region on the CPU requires moving the region to the CPU. Sometimes this means copying an entire region back just to increment a pointer, and then sending the region immediately back to the GPU for a kernel. We could instead add a couple of small kernels to do the increment on the GPU, saving some large transfers.