flatironinstitute / CaImAn

Computational toolbox for large scale Calcium Imaging Analysis, including movie handling, motion correction, source extraction, spike deconvolution and result visualization.
https://caiman.readthedocs.io
GNU General Public License v2.0
637 stars 370 forks source link

spatial_update with ellipse crashes for large 4D recording #1065

Open oterocoronel opened 1 year ago

oterocoronel commented 1 year ago

Your setup:

  1. Operating System (Linux, MacOS, Windows): Linux
  2. Hardware type (x86, ARM..) and RAM: 256 GB RAM
  3. Caiman version (e.g. 1.9.12): 1.9.13
  4. How you installed Caiman (pure conda, conda + compile, colab, ..): conda
  5. Details:

I am trying to update the spatial components that I got after initializing with greedy_roi. My recording is 2660 frames of size ~700x600x15 (x,y,z) and I'm getting ~11,000 good ROIs. I want to use the 'ellipse' mode, since it has given me better results than the dilate mode. This works fine for ~9000 ROIs, but crashes systematically when having ~11000 ROIs. When crashing, Jupyter notebook and the terminal both close, so I don't get an error code. The last thing logged is:

170429 [spatial.py:update_spatial_components():185] [1049503] Computing support of spatial components

It seems like I should have enough space in memory to allocate more stuff: my recording is 70GB, my total RAM is 256GB, memory usage goes up to 60% before crashing:

image

I've used the 'dilate' method a few times with ~15000 ROIs and it seems to work just fine. I have noticed that for 'dilate' the memory usage is consistently lower (it never goes beyond 10-20%), while for the ellipse it slowly increases over time reaching 40%, and then increases even further.

My guess would be that at some point it is reshaping the results, and having to allocate an array of size nROIs x frame_size (700x600x15), possibly similar to #1060, but in this case it is hard for me to track down where is the actual root of the issue.

Any thoughts? Thanks!!

oterocoronel commented 1 year ago

Tracked down the issue to line 909 in spatial.py:

dist_indicator = scipy.sparse.coo_matrix((np.asarray(dist_indicator)).squeeze().T)

oterocoronel commented 1 year ago

As suspected, it was indeed a memory issue related to the creation of the sparse matrix. I was able to solve it by replacing some code for the 'ellipse' method with what is already implemented for the 'dilate' method. The idea is to create a sparse matrix for each roi as they are processed. In spatial.py I replaced line 455 with this:

    dist_indicator_i = np.sqrt(np.sum([old_div((dist_cm * V[:, k]) ** 2, dkk[k]) for k in range(len(dkk))], 0))[:,None] <= dist
    dist_indicator_i_sparse = scipy.sparse.coo_matrix(dist_indicator_i)
    return dist_indicator_i_sparse

And also replaced lines 898 to 909 with this:

          if dview is None:
              parallel_result = list(map(construct_ellipse_parallel, pars))
          else:
              if 'multiprocessing' in str(type(dview)):
                  parallel_result = dview.map_async(
                      construct_ellipse_parallel, pars).get(4294967)
              else:
                  parallel_result = dview.map_sync(construct_ellipse_parallel, pars)
          indptr = [0]
          indices:List = []
          data = []
          Coor = dict()
          for res in parallel_result:
              indptr.append(indptr[-1] + len(res.row))
              indices.extend(res.row)
              data.extend(len(res.row)*[True])

          dist_indicator = csc_matrix((data, indices, indptr), shape=(d, nr))

Now memory usage during 'ellipse' mode is much lower (similar to what I was seeing for 'dilate'), I don't see the memory-usage spike, and the code is not crashing anymore. The extracted ROIs look good.

image

EricThomson commented 1 year ago

Thanks a lot for the detailed analysis. This is something we will need to look at more closely when we have time in the near future.