Closed jtalley24 closed 6 years ago
@jtalley24 Does the error persist with dview=None
? It's important that you do this as described in the instructions.
@j-friedrich The error message boils down to:
packages\caiman\source_extraction\cnmf\initialization.py", line 1488, in init_neurons_corr_pnr
center[:, num_neurons] = [c, r]
IndexError: index 0 is out of bounds for axis 1 with size 0
Is it related to #346 ?
Yes, same exact error with dview=None
I don't think it is related to that issue, as this is a different type of error occurring in a different step and is not related to parallel processing.
Does the index error imply that num_neurons = 0? Surely it finds a nonzero number of components/roi's as it had no trouble doing so with fewer videos?
Most likely the same problem as in #346. Checkout branch issue323 and let me know whether the change there fix it.
@j-friedrich I tried applying this fix today (setting continue_searching=False
) and received the same Index error, on the same step... ideas?
@jtalley24 Did you use the updating instructions and branch issue323?
@epnev @j-friedrich so after multiple complete uninstall/reinstall's of the caiman pipeline and applying the branch issue323 fix, my colleague (@lumin223) and I are receiving the same "Value error" on the CNMF step consistently on two separate machines. This error occurs both when we concatenate many (>10 of 15-30 sec each) short videos and when we try to run one long video (8-10 min) for analysis. This is puzzling, because I imagine we are not the only ones who have tried concatenated analysis for a day's worth of behavior trials or analysis of one long continuous video, yet we receive this same error using the most up-to-date version of CaImAn on two different computers. I have copied the log below:
`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)
@jtalley24 can you answer the following clarifying questions:
@epnev
I tried,
Thanks.
@j-friedrich can you take a look? It seems that the original error (also related to #346) is gone but a new appeared.
@jtalley24 @lumin223
Can you please try on 64-bit Linux, if you haven't already?
Seems you are using 32-bit Windows and the number of nonzero elements in B_tot
is greater than the maximal integer representable with 32 bits, thus B_tot.nnz
ends up being negative raising the error.
Could you also try whether changing line 424 in map_reduce.py to
Im = scipy.sparse.csc_matrix( (1. / mask, (np.arange(d), np.arange(d))), dtype=np.float32)
solves the issue by avoiding conversion of B_tot
from csc to csr when evaluating Im.dot(B_tot)
.
@j-friedrich Thanks for the suggestion! So I tried changing the code in 'map_reduce.py' in windows10 (python 3.6, 64 Gb memory, 6 core). Actually, it made the error that 'Kernel Restarting: the kernel appears to have died. It will restart automatically'
And now I am trying to run the code in Ubuntu, jupyter notebook (this PC is 32 GB, 8 core). There is some memory error at the beginning, but I increased swap memory up to 22 GB and memory error is fixed. Now they are on the step "Set CNMF parameters and run it," it has been doing for 2 hr with 3 min video. And now still going on. The last message is (4978, 480, 752) using 15 processes using 4000 pixels using 5000 block_size (80, 80) And it has lasted 2 more hours. I am not sure whether it's still working or not.
%% All those results are coming from Branch 'Issue323'. Does it better if we run with the master branch?
The restarted kernel is probably due to insufficient memory as well. Instead of increasing swap, I suggest you reduce the number of processes from 15 to e.g. 8 or less in order to trade speed for memory.
Have you set the environmental variables MKL_NUM_THREADS and OPENBLAS_NUM_THREADS to 1? This severely impacts computing time when processing in patches.
Does the error persist with dview=None
?
@j-friedrich Thank you. So for last 2 days, I tried all suggest that you mentioned. (Ubuntu 18.04.1 LTS /n /l, 32 Gb ram)
decreased processes (15 to 7) and made sure MKL_NUM_THREADS, OPENBLAS_NUM_THREAD =1 In this case, it has the same memory error. Also, I read another post in gitter that about splits_rig, sprits_els. So that I increased the number to 40. Also, I noticed that the value error outbreak when swap memory is full while CNMF-e step. At the beginning, swap memory was empty, and it grew up along the time. I thought if I expand more space in swap memory, maybe it would help. But you didn't recommend to do that. Is there the reason?
dview_ None: it causes kernel dead.
I wondered this happens only us. Have you ever had any error like us?
And, quick question! What is the optimal number of 'n_processes'? Is it depends on the number of CPU core? Or is it depends on size of the RAM?
@j-friedrich I tested with bigger size of swap memory, it makes kernel dead. Now I know what you meant.
The number of processes specifies how many patches are processed in parallel, thus a higher number decreases computing time but increases RAM usage. If there's plenty of RAM, the optimal number of processes equals number of CPU cores. In your case RAM is very small, hence you need to choose a smaller number. Setting dview=None
switches of parallel processing, so that the number of processes is de facto 1. Don't you get a more meaningful error message with dview=None
than merely 'kernel dead'? If it was due to insufficient memory, you would need to decrease the size of the patches (parameter rf).
@lumin223 do you have any update on this issue? Could you close if solved?
@agiovann Hi, thanks for the checking. We need to try out with a high power computer. We will update here for the share
It seems fixed! Sorry for the late updates. I think higher RAM is better to have.
Glad to hear. Thanks for letting us know.
Windows 10, Python 3, Jupyter Notebook
The pipeline works reasonably well for analyzing up to ~10 short videos (15-30 sec) of 480x752 in sequence, but fails when analyzing more. It runs successfully through motion correction but fails on the CNMF step, yielding the error below:
`--------------------------------------------------------------------------- RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(args)) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\map_reduce.py", line 154, in cnmf_patches cnm = cnm.fit(images) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\cnmf.py", line 457, in fit Y, sn=sn, options_total=options, **options['init_params']) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\initialization.py", line 358, in initialize_components sn=sn, nb=nb, ssub=ssub, ssub_B=ssub_B, init_iter=init_iter) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\initialization.py", line 1031, in greedyROI_corr swap_dim=True, save_video=save_video, video_name=video_name) File "C:\Users\FuccilloLab\Anaconda3\envs\caiman\lib\site-packages\caiman\source_extraction\cnmf\initialization.py", line 1488, in init_neurons_corr_pnr center[:, num_neurons] = [c, r] IndexError: index 0 is out of bounds for axis 1 with size 0 """
The above exception was the direct cause of the following exception:
IndexError Traceback (most recent call last)