Help needed in improving reconstruction speed

roshtha commented 4 years ago

Expected Behavior

TIGRE suggestion to improve reconstruction speed - avoid check for available GPU memory and to assign a value for available memory

Actual Behavior

The reconstruction time not improved considerably

Code to reproduce the problem (If applicable)

I run OS_SART algorithm in TIGRE toolkit for DBT reconstruction. I wanted to speed up the reconstruction time. I follow TIGRE further tuning and commented out the function checkFreeMemory in all .cu files and assigned mem_GPU_global with 95% of actual GPU free memory.

// checkFreeMemory(deviceCount,&mem_GPU_global);
//TotalMemory: 2.1475e+09; Avaialble memory - 1706500000 ;  95% of Avaialble memory  - 1621175000
mem_GPU_global = 1621175000;

For a projection data set of size 3582x2792x16 Time to complete OS_SART (with checkFreeMemory) : 31.2 secs Time to complete OS_SART (without checkFreeMemory) : 29.8 secs

Also would like to know about other possible approaches for improving the reconstruction speed with the toolkit?

Specifications

MATLAB version: Matlab R2015b
OS: Windows 10
CUDA version: 10.2

Thanks.

AnderBiguri commented 4 years ago

Ah, unfortunately, there is really not much more you can do software-wise (otherwise I would have coded it in!). TIGRE's GPU code is basically as fast as code gets (maybe ASTRA toolbox can be a bit faster for smaller image sizes, but for anything above 512^3 TIGRE is measurably faster).

There are 2 ways of making the code theoretically faster.

1 - Recode the entire algorithm fully in CUDA. This only works if your GPU memory is big enough to contain 3ximage memory 2x projection memory, approximately. I have done this in the past for another algorithm and achieved much faster reconstructions, but it depends in parameters and memory sizes. This is not a trivial task, it takes a decent amount of hours.

2- Buy better, or faster, hardware. Buy a better GPU, or buy multiple GPUs, they will cut the time by a lot (depending on the sizes etc).

We are quite in the edge of computational speed here, there is really not much more that we can do (as far as I am aware) to make this faster from the algorithm perspective.

AnderBiguri commented 4 years ago

Maybe you can try a faster converging algorithm. In theory, CGLS should get to a solution in less iterations than OS-SART, in practice, its not always a good solution.

roshtha commented 4 years ago

Thanks @AnderBiguri for the information.

CERN / TIGRE