ComputationalRadiationPhysics / imresh

Shrink-Wrap Phase Reconstruction Algorithm
MIT License
3 stars 2 forks source link

Manually manage GPU memory #41

Open mxmlnkn opened 8 years ago

mxmlnkn commented 8 years ago

We could save some cudaMallocs and cudaFrees, if taskQueue.cu could do cudaMalloc in the initializer call where it also creates the work thread list. The pointers to the memory locations or the one large memory location could then be given to shrink wrap which in the current version calls cudaMalloc and cudaFree each time.

It would make cudaShrinkWrap harder to call, so I would prefer to copy-paste it to cudaShrinkWrapBatch, which could be called by the first after it allocates the needed memory.

Ferruck commented 8 years ago

This would speed up the library indeed. The main question is, after allocating the memory (probably the best place for this would be the task queue) should we pass

I guess the first option would be the harder one to implement but both versions should be benchmarked.

mxmlnkn commented 8 years ago

Looking at the benchmarks Mallocs may not be that much of a performance issue, and even if they were, the stream parallelism should cover it.