JinghaoLu / MIN1PIPE

A MINiscope 1-photon-based Calcium Imaging Signal Extraction PIPEline.
GNU General Public License v3.0
56 stars 25 forks source link

GPU memory usage optimisation #31

Open plodocus opened 4 years ago

plodocus commented 4 years ago

Hi Jinghao,

I did some GPU memory profiling on https://github.com/JinghaoLu/MIN1PIPE/blob/32084136b3d67cb77f94c1cd3e26d85dc9712f0a/utilities/movement_correction/inter_section.m#L89 If tmp is a 480-by-752-by-5 double (around 14 MB) I get a peak GPU memory usage of 623 MB, i.e. about 45 times that of a single matrix. I tested it with smaller matrices as well and it seems that MATLAB has some GPU overhead of 200 MB, meaning that every 1 MB in tmp will increase maximum GPU memory usage by about 25 MB. This was of course all tested without a parallel pool.

Is there some way to improve GPU memory usage? The increased GPU memory load really limits the number of parallel workers.

For example, how often is the image padding used? I didn't go through all the functions called by lk_logdemons_unit() but gradient_fast() for example only seems to need a 0-padding margin of one element on every side of the image.

Best wishes, Daniel

JinghaoLu commented 4 years ago

This is tricky. The core part using gpu is actually logdemons_unit and the function logdemons called in it.

logdemons_unit contains an iteration of logdemons, and saves gpuArray "image_output", "x-dimension deformation matrix" and "y-dimension deformation matrix" every iteration. That's 3 X num_iter times of the input frame size. I do not think this can be further reduced unless a better algorithm appears. So the only thing that can be potentially improved is logdemons. Maybe first profile it and then we can take a look.