Manually manage GPU memory

mxmlnkn commented 8 years ago

We could save some cudaMallocs and cudaFrees, if taskQueue.cu could do cudaMalloc in the initializer call where it also creates the work thread list. The pointers to the memory locations or the one large memory location could then be given to shrink wrap which in the current version calls cudaMalloc and cudaFree each time.

It would make cudaShrinkWrap harder to call, so I would prefer to copy-paste it to cudaShrinkWrapBatch, which could be called by the first after it allocates the needed memory.

Ferruck commented 8 years ago

This would speed up the library indeed. The main question is, after allocating the memory (probably the best place for this would be the task queue) should we pass

the whole memory to shrink wrap and let it handle the memory itself or
push the images into the memory inside of the task queue and pass pointers to shrink wrap as before?

I guess the first option would be the harder one to implement but both versions should be benchmarked.

mxmlnkn commented 8 years ago

Looking at the benchmarks Mallocs may not be that much of a performance issue, and even if they were, the stream parallelism should cover it.

ComputationalRadiationPhysics / imresh

Manually manage GPU memory #41