"Index Error" in CNMF step when running many videos in sequence

jtalley24 commented 6 years ago

Windows 10, Python 3, Jupyter Notebook
The pipeline works reasonably well for analyzing up to ~10 short videos (15-30 sec) of 480x752 in sequence, but fails when analyzing more. It runs successfully through motion correction but fails on the CNMF step, yielding the error below:
`--------------------------------------------------------------------------- RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(args)) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\map_reduce.py", line 154, in cnmf_patches cnm = cnm.fit(images) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\cnmf.py", line 457, in fit Y, sn=sn, options_total=options, **options['init_params']) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\initialization.py", line 358, in initialize_components sn=sn, nb=nb, ssub=ssub, ssub_B=ssub_B, init_iter=init_iter) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\initialization.py", line 1031, in greedyROI_corr swap_dim=True, save_video=save_video, video_name=video_name) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\initialization.py", line 1488, in init_neurons_corr_pnr center[:, num_neurons] = [c, r] IndexError: index 0 is out of bounds for axis 1 with size 0 """

The above exception was the direct cause of the following exception:

IndexError Traceback (most recent call last)

in () 26 del_duplicates=True, # whether to remove duplicates from initialization 27 border_pix=bord_px) # number of pixels to not consider in the borders ---> 28 cnm.fit(Y) ~\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\cnmf.py in fit(self, images) 569 gnb=self.gnb, border_pix=self.border_pix, 570 low_rank_background=self.low_rank_background, --> 571 del_duplicates=self.del_duplicates) 572 573 # options = CNMFSetParms(Y, self.n_processes, p=self.p, gSig=self.gSig, K=A.shape[ ~\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\map_reduce.py in run_CNMF_patches(file_name, shape, options, rf, stride, gnb, dview, memory_fact, border_pix, low_rank_background, del_duplicates) 264 if dview is not None: 265 if 'multiprocessing' in str(type(dview)): --> 266 file_res = dview.map_async(cnmf_patches, args_in).get(4294967) 267 else: 268 try: ~\Anaconda3\envs\caiman\lib\multiprocessing\pool.py in get(self, timeout) 642 return self._value 643 else: --> 644 raise self._value 645 646 def _set(self, i, obj): IndexError: index 0 is out of bounds for axis 1 with size 0 ` Are these errors due to the pipeline being unable to handle many videos, or is something else going wrong here? It is not a memory error or overflow error as I would have expected if it were due to overwhelming the processor.

epnev commented 6 years ago

@jtalley24 Does the error persist with dview=None? It's important that you do this as described in the instructions.

@j-friedrich The error message boils down to:

packages\caiman\source_extraction\cnmf\initialization.py", line 1488, in init_neurons_corr_pnr
center[:, num_neurons] = [c, r]
IndexError: index 0 is out of bounds for axis 1 with size 0

Is it related to #346 ?

jtalley24 commented 6 years ago

Yes, same exact error with dview=None

I don't think it is related to that issue, as this is a different type of error occurring in a different step and is not related to parallel processing.

Does the index error imply that num_neurons = 0? Surely it finds a nonzero number of components/roi's as it had no trouble doing so with fewer videos?

j-friedrich commented 6 years ago

Most likely the same problem as in #346. Checkout branch issue323 and let me know whether the change there fix it.

jtalley24 commented 6 years ago

@j-friedrich I tried applying this fix today (setting continue_searching=False) and received the same Index error, on the same step... ideas?

epnev commented 6 years ago

@jtalley24 Did you use the updating instructions and branch issue323?

jtalley24 commented 6 years ago

@epnev @j-friedrich so after multiple complete uninstall/reinstall's of the caiman pipeline and applying the branch issue323 fix, my colleague (@lumin223) and I are receiving the same "Value error" on the CNMF step consistently on two separate machines. This error occurs both when we concatenate many (>10 of 15-30 sec each) short videos and when we try to run one long video (8-10 min) for analysis. This is puzzling, because I imagine we are not the only ones who have tried concatenated analysis for a day's worth of behavior trials or analysis of one long continuous video, yet we receive this same error using the most up-to-date version of CaImAn on two different computers. I have copied the log below:

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in () 26 del_duplicates=True, # whether to remove duplicates from initialization 27 border_pix=bord_px) # number of pixels to not consider in the borders ---> 28 cnm.fit(Y) ~\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\cnmf.py in fit(self, images) 569 gnb=self.gnb, border_pix=self.border_pix, 570 low_rank_background=self.low_rank_background, --> 571 del_duplicates=self.del_duplicates) 572 573 # options = CNMFSetParms(Y, self.n_processes, p=self.p, gSig=self.gSig, K=A.shape[ ~\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\map_reduce.py in run_CNMF_patches(file_name, shape, options, rf, stride, gnb, dview, memory_fact, border_pix, low_rank_background, del_duplicates) 432 f = None 433 elif low_rank_background is None: --> 434 b = Im.dot(B_tot) 435 f = scipy.sparse.csr_matrix(F_tot) 436 print("Leaving background components intact") ~\Anaconda3\envs\caiman\lib\site-packages\scipy\sparse\base.py in dot(self, other) 359 360 """ --> 361 return self * other 362 363 def power(self, n, dtype=None): ~\Anaconda3\envs\caiman\lib\site-packages\scipy\sparse\base.py in __mul__(self, other) 477 if self.shape[1] != other.shape[0]: 478 raise ValueError('dimension mismatch') --> 479 return self._mul_sparse_matrix(other) 480 481 # If it's a list or whatever, treat it like a matrix ~\Anaconda3\envs\caiman\lib\site-packages\scipy\sparse\compressed.py in _mul_sparse_matrix(self, other) 480 481 major_axis = self._swap((M,N))[0] --> 482 other = self.__class__(other) # convert to this format 483 484 idx_dtype = get_index_dtype((self.indptr, self.indices, ~\Anaconda3\envs\caiman\lib\site-packages\scipy\sparse\compressed.py in __init__(self, arg1, shape, dtype, copy) 30 arg1 = arg1.copy() 31 else: ---> 32 arg1 = arg1.asformat(self.format) 33 self._set_self(arg1) 34 ~\Anaconda3\envs\caiman\lib\site-packages\scipy\sparse\base.py in asformat(self, format, copy) 324 raise ValueError('Format {} is unknown.'.format(format)) 325 else: --> 326 return convert_method(copy=copy) 327 328 ################################################################### ~\Anaconda3\envs\caiman\lib\site-packages\scipy\sparse\csc.py in tocsr(self, copy) 140 maxval=max(self.nnz, N)) 141 indptr = np.empty(M + 1, dtype=idx_dtype) --> 142 indices = np.empty(self.nnz, dtype=idx_dtype) 143 data = np.empty(self.nnz, dtype=upcast(self.dtype)) 144 ValueError: negative dimensions are not allowed `

epnev commented 6 years ago

@jtalley24 can you answer the following clarifying questions:

Is the original "index out of bounds" error resolved with the patch in branch issue323?
Does the "Value error" you report here occur if you try the analysis on smaller files?

lumin223 commented 6 years ago

@epnev

Index error doesn't show up anymore in branch 323. Value error came up in branch 323.
In the smaller file (<1GB, 3000 frames), all the analysis was okay. When I went with a bigger file (>1.5 GB, >4000 frames) it came up.

I tried,

downsample (ssub, tsub: changed)
changed split_els, split_rig. and value error was still there.

epnev commented 6 years ago

Thanks.

@j-friedrich can you take a look? It seems that the original error (also related to #346) is gone but a new appeared.

j-friedrich commented 6 years ago

@jtalley24 @lumin223 Can you please try on 64-bit Linux, if you haven't already? Seems you are using 32-bit Windows and the number of nonzero elements in B_tot is greater than the maximal integer representable with 32 bits, thus B_tot.nnz ends up being negative raising the error.

Could you also try whether changing line 424 in map_reduce.py to Im = scipy.sparse.csc_matrix( (1. / mask, (np.arange(d), np.arange(d))), dtype=np.float32) solves the issue by avoiding conversion of B_tot from csc to csr when evaluating Im.dot(B_tot).

lumin223 commented 6 years ago

@j-friedrich Thanks for the suggestion! So I tried changing the code in 'map_reduce.py' in windows10 (python 3.6, 64 Gb memory, 6 core). Actually, it made the error that 'Kernel Restarting: the kernel appears to have died. It will restart automatically'

And now I am trying to run the code in Ubuntu, jupyter notebook (this PC is 32 GB, 8 core). There is some memory error at the beginning, but I increased swap memory up to 22 GB and memory error is fixed. Now they are on the step "Set CNMF parameters and run it," it has been doing for 2 hr with 3 min video. And now still going on. The last message is (4978, 480, 752) using 15 processes using 4000 pixels using 5000 block_size (80, 80) And it has lasted 2 more hours. I am not sure whether it's still working or not.

%% All those results are coming from Branch 'Issue323'. Does it better if we run with the master branch?

j-friedrich commented 6 years ago

The restarted kernel is probably due to insufficient memory as well. Instead of increasing swap, I suggest you reduce the number of processes from 15 to e.g. 8 or less in order to trade speed for memory.

Have you set the environmental variables MKL_NUM_THREADS and OPENBLAS_NUM_THREADS to 1? This severely impacts computing time when processing in patches.

Does the error persist with dview=None?

lumin223 commented 6 years ago

@j-friedrich Thank you. So for last 2 days, I tried all suggest that you mentioned. (Ubuntu 18.04.1 LTS /n /l, 32 Gb ram)

decreased processes (15 to 7) and made sure MKL_NUM_THREADS, OPENBLAS_NUM_THREAD =1 In this case, it has the same memory error. Also, I read another post in gitter that about splits_rig, sprits_els. So that I increased the number to 40. Also, I noticed that the value error outbreak when swap memory is full while CNMF-e step. At the beginning, swap memory was empty, and it grew up along the time. I thought if I expand more space in swap memory, maybe it would help. But you didn't recommend to do that. Is there the reason?
dview_ None: it causes kernel dead.

I wondered this happens only us. Have you ever had any error like us?

And, quick question! What is the optimal number of 'n_processes'? Is it depends on the number of CPU core? Or is it depends on size of the RAM?

lumin223 commented 6 years ago

@j-friedrich I tested with bigger size of swap memory, it makes kernel dead. Now I know what you meant.

j-friedrich commented 6 years ago

The number of processes specifies how many patches are processed in parallel, thus a higher number decreases computing time but increases RAM usage. If there's plenty of RAM, the optimal number of processes equals number of CPU cores. In your case RAM is very small, hence you need to choose a smaller number. Setting dview=None switches of parallel processing, so that the number of processes is de facto 1. Don't you get a more meaningful error message with dview=None than merely 'kernel dead'? If it was due to insufficient memory, you would need to decrease the size of the patches (parameter rf).

agiovann commented 6 years ago

@lumin223 do you have any update on this issue? Could you close if solved?

lumin223 commented 6 years ago

@agiovann Hi, thanks for the checking. We need to try out with a high power computer. We will update here for the share

lumin223 commented 6 years ago

It seems fixed! Sorry for the late updates. I think higher RAM is better to have.

j-friedrich commented 6 years ago

Glad to hear. Thanks for letting us know.

flatironinstitute / CaImAn

"Index Error" in CNMF step when running many videos in sequence #348