Running the CNMF demo pipeline with p=2 causes the pipeline to hang on my system

loftusa commented 1 year ago

Hi all,

I ran through the demo pipeline here with a few changed parameters.

I am running on a remote system with 32 cores and 100GB of memory and a GPU. When I run

# %%capture
#%% RUN CNMF ON PATCHES
# First extract spatial and temporal components on patches and combine them
# for this step deconvolution is turned off (p=0). If you want to have
# deconvolution within each patch change params.patch['p_patch'] to a
# nonzero value
cnm = cnmf.CNMF(n_processes, params=opts, dview=dview)
cnm = cnm.fit(images)

and deconvolution was turned on instead (e.g., set p=2), the run hung for about 3 hours before I killed it. If this is intentional, there should likely be an error message preventing the pipeline to run at this point.

I tried messing around with p_patch as well, but that was with p=2 still.

Here is my opts:

CNMFParams:

data:

{'caiman_version': '1.9.15',
 'decay_time': 1.8,
 'dims': (657, 657),
 'dxy': (1, 1),
 'fnames': ['/home/jovyan/work/notebooks/try_cnmf_exp/C11.tif'],
 'fr': 10,
 'last_commit': 'FILE-1686357314',
 'mmap_C': None,
 'mmap_F': None,
 'var_name_hdf5': 'mov'}

spatial_params:

{'block_size_spat': 5000,
 'dist': 3,
 'expandCore': array([[0, 0, 1, 0, 0],
       [0, 1, 1, 1, 0],
       [1, 1, 1, 1, 1],
       [0, 1, 1, 1, 0],
       [0, 0, 1, 0, 0]]),
 'extract_cc': True,
 'maxthr': 0.1,
 'medw': None,
 'method_exp': 'dilate',
 'method_ls': 'lasso_lars',
 'n_pixels_per_process': 13924,
 'nb': 1,
 'normalize_yyt_one': True,
 'nrgthr': 0.9999,
 'num_blocks_per_run_spat': 20,
 'se': None,
 'ss': None,
 'thr_method': 'nrg',
 'update_background_components': True}

temporal_params:

{'ITER': 2,
 'bas_nonneg': False,
 'block_size_temp': 5000,
 'fudge_factor': 0.96,
 'lags': 5,
 'memory_efficient': False,
 'method_deconvolution': 'oasis',
 'nb': 1,
 'noise_method': 'mean',
 'noise_range': [0.25, 0.5],
 'num_blocks_per_run_temp': 20,
 'optimize_g': False,
 'p': 0,
 's_min': None,
 'solvers': ['ECOS', 'SCS'],
 'verbosity': False}

init_params:

{'K': 8,
 'SC_kernel': 'heat',
 'SC_nnn': 20,
 'SC_normalize': True,
 'SC_sigma': 1,
 'SC_thr': 0,
 'SC_use_NN': False,
 'alpha_snmf': 100,
 'center_psf': False,
 'gSig': [3, 3],
 'gSiz': (7, 7),
 'init_iter': 2,
 'kernel': None,
 'lambda_gnmf': 1,
 'maxIter': 5,
 'max_iter_snmf': 500,
 'method_init': 'greedy_roi',
 'min_corr': 0.85,
 'min_pnr': 20,
 'nIter': 5,
 'nb': 1,
 'normalize_init': True,
 'options_local_NMF': None,
 'perc_baseline_snmf': 20,
 'ring_size_factor': 1.5,
 'rolling_length': 100,
 'rolling_sum': True,
 'seed_method': 'auto',
 'sigma_smooth_snmf': (0.5, 0.5, 0.5),
 'ssub': 1,
 'ssub_B': 2,
 'tsub': 1}

preprocess_params:

{'check_nan': True,
 'compute_g': False,
 'include_noise': False,
 'lags': 5,
 'max_num_samples_fft': 3072,
 'n_pixels_per_process': 13924,
 'noise_method': 'mean',
 'noise_range': [0.25, 0.5],
 'p': 2,
 'pixels': None,
 'sn': None}

patch_params:

{'border_pix': 0,
 'del_duplicates': False,
 'in_memory': True,
 'low_rank_background': True,
 'memory_fact': 1,
 'n_processes': 31,
 'nb_patch': 1,
 'only_init': True,
 'p_patch': 0,
 'p_ssub': 2,
 'p_tsub': 2,
 'remove_very_bad_comps': False,
 'rf': 40,
 'skip_refinement': False,
 'stride': 20}

online:

{'N_samples_exceptionality': 18,
 'W_update_factor': 1,
 'batch_update_suff_stat': False,
 'dist_shape_update': False,
 'ds_factor': 1,
 'epochs': 1,
 'expected_comps': 500,
 'full_XXt': False,
 'init_batch': 3005,
 'init_method': 'bare',
 'iters_shape': 5,
 'max_comp_update_shape': inf,
 'max_num_added': 5,
 'max_shifts_online': 10,
 'min_SNR': 2.5,
 'min_num_trial': 5,
 'minibatch_shape': 100,
 'minibatch_suff_stat': 5,
 'motion_correct': True,
 'movie_name_online': '/home/jovyan/work/notebooks/try_cnmf_exp/online_movie.mp4',
 'n_refit': 0,
 'normalize': False,
 'num_times_comp_updated': inf,
 'opencv_codec': 'H264',
 'path_to_model': '/home/jovyan/caiman_data/model/cnn_model_online.h5',
 'ring_CNN': False,
 'rval_thr': 0.88,
 'save_online_movie': False,
 'show_movie': False,
 'simultaneously': False,
 'sniper_mode': False,
 'stop_detection': False,
 'test_both': False,
 'thresh_CNN_noisy': 0.5,
 'thresh_fitness_delta': -50,
 'thresh_fitness_raw': -91.46966899101643,
 'thresh_overlap': 0.5,
 'update_freq': 200,
 'update_num_comps': True,
 'use_corr_img': False,
 'use_dense': True,
 'use_peak_max': True}

quality:

{'SNR_lowest': 0.5,
 'cnn_lowest': 0.1,
 'gSig_range': None,
 'max_ecc': 3,
 'min_SNR': 2.5,
 'min_cnn_thr': 0.99,
 'rval_lowest': -1,
 'rval_thr': 0.88,
 'use_cnn': True,
 'use_ecc': False}

merging:

{'do_merge': True,
 'max_merge_area': None,
 'merge_parallel': False,
 'merge_thr': 0.7}

motion:

{'border_nan': 'copy',
 'gSig_filt': None,
 'indices': (slice(None, None, None), slice(None, None, None)),
 'is3D': False,
 'max_deviation_rigid': 3,
 'max_shifts': (5, 5),
 'min_mov': None,
 'niter_rig': 1,
 'nonneg_movie': True,
 'num_frames_split': 80,
 'num_splits_to_process_els': None,
 'num_splits_to_process_rig': None,
 'overlaps': (24, 24),
 'pw_rigid': False,
 'shifts_opencv': True,
 'splits_els': 37,
 'splits_rig': 37,
 'strides': (48, 48),
 'upsample_factor_grid': 4,
 'use_cuda': False}

ring_CNN:

{'loss_fn': 'pct',
 'lr': 0.001,
 'lr_scheduler': None,
 'max_epochs': 100,
 'n_channels': 2,
 'path_to_model': None,
 'patience': 3,
 'pct': 0.01,
 'remove_activity': False,
 'reuse_model': False,
 'use_add': False,
 'use_bias': False,
 'width': 5}

kushalkolar commented 1 year ago

You might be running out of RAM, try reducing n_processes when you create the dview. This reduces RAM usage as well because fewer patches will run simultaneously.

loftusa commented 1 year ago

@kushalkolar I tried reducing to 32, and then 16 processes on a system with 72 CPUs and 100GB of RAM, using CNMF. Still hangs just as much.

EricThomson commented 1 year ago

one weird thing I'm noticing is we have p twice in our params. This is bad. It's in temporal and preprocessing, and in your temporal (which is more the canonical field) it is currently set to 0.

This is asking for trouble though (on our part not yours, but by association now yours :smile:)

I would test it first with both P's set to the same value, and set p_patch to 0 because that's a p most people don't mess with.

Then set p to 0 and p_patch to positive val and see what happens.

If it still hangs after all this (and I'd do all this with a short demo movie because I assume you have some megamovie), then I'd set dview to None and run with logging level set to DEBUG and see what's going on in the various permutations that work, and don't work, to find where the one(s) that don't work go awry.

EricThomson commented 1 year ago

Note I will try to reproduce this today and edit this reply with result.

Edit: I wasn't able to reproduce the problem. IN the demo notebook, I had temporal.p and proprocess.p set to 2. I also tried with the first set to 0 and second to 2, and my code ran.

For debugging, I recommend following the steps I suggested above (starting with a small demo movie to see if this is a memory issue). How big is your movie (how much memory, and dims?). Does it run on this movie when you use default params?

loftusa commented 1 year ago

Hi @EricThomson !

So, yesterday I did some time-profiling to get to the bottom of this.

Using your demo pipeline, p=0, with your demo data, runtime was the following:

Using your demo pipeline, p=1:

For p=2:

Then I switched to my own data (657x657, 601 frames). When I used only the demo pipeline, with no other settings changed, and only swapped in my own data, we ran -- albeit slowly.

Each of our full TIF stacks is actually 3005 frames, but they include five recording periods. So I think the original problem was that, trying to run on the entire 3005 at once, it took so long that it seemed like it was hanging.

The actual pipeline we are developing includes running separately on each of the five recording periods individually, then using multisession_registration to combine them. Here are runtimes when I split the full 3005 frames into 5x601 frames, and then just ran the full pipeline with a for-loop on each 601-length movie. This is with n_processes=64 and ssub, tsub both set to 2 (so I was downsampling this time):

You can see that the first two movies each took 232 seconds and 163 seconds respectively (which I think is very reasonable), but then we had the third, fourth and fifth movies take 1,315, 2,241, and 10,705 seconds respectively (roughly 3 hours).

Each of the movies should be exactly the same: 601 frames at 657x657 resolution.

Here are the parameters I was using for the last pipeline:

# CHANGED VALUES

# dataset dependent parameters
fr = 10                             # imaging rate in frames per second
decay_time = 1.8                    # length of a typical transient in seconds

# motion correction parameters
strides = (48, 48)          # start a new patch for pw-rigid motion correction every x pixels
overlaps = (24, 24)         # overlap between pathes (size of patch strides+overlaps)
max_shifts = (5, 5)         # maximum allowed rigid shifts (in pixels)
max_deviation_rigid = 3     # maximum shifts deviation allowed for patch with respect to rigid shifts
pw_rigid = False            # flag for performing non-rigid motion correction

# parameters for source extraction and deconvolution
p = 2                       # order of the autoregressive system
gnb = 1                     # number of global background components
merge_thr = 0.85            # merging threshold, max correlation allowed
rf = 30                     # half-size of the patches in pixels. e.g., if rf=25, patches are 50x50
stride_cnmf = 12             # amount of overlap between the patches in pixels
K = 10                       # number of components per patch
gSig = [4, 4]               # expected half size of neurons in pixels
gSiz = tuple(4*np.array(gSig)+1)
method_init = 'greedy_roi'  # initialization method (if analyzing dendritic data using 'sparse_nmf')
ssub = 2                    # spatial subsampling during initialization
tsub = 2                    # temporal subsampling during intialization

# parameters for component evaluation
min_SNR = 2.0               # signal to noise ratio for accepting a component
rval_thr = 0.85              # space correlation threshold for accepting a component
cnn_thr = 0.99              # threshold for CNN based classifier
cnn_lowest = 0.1 # neurons with cnn probability lower than this value are rejected
min_size = np.pi*(gSig[0]/1.5)**2
max_size = np.pi*(gSig[0]*1.5)**2

# parameters for multiprocessing
n_processes = 71

loftusa commented 1 year ago

Oh, lastly, I also tried my own data with the for-loop, with p=0 (deconvolution turned off). I had a fairly reasonable 18 minute runtime (values are in units of seconds)

EricThomson commented 10 months ago

@loftusa is this issue still a problem, or are things ok? I figured it was clearly a memory problem, not sure how you fixed it if you did! 😄

EricThomson commented 10 months ago

Closing due to inactivity.

flatironinstitute / CaImAn

Running the CNMF demo pipeline with p=2 causes the pipeline to hang on my system #1110