catalystneuro / mease-lab-to-nwb

MIT License
3 stars 2 forks source link

Slow bin conversion and postprocessing #42

Closed ross-folkard closed 2 years ago

ross-folkard commented 3 years ago

Hi, @luiztauffer, @alejoe91

I'm trying to concatenate some files ~ 80gb in total, 20,000 seconds of data.

When sorting the data using different n_jobs_bin and chunk_mb values than the default, the computer appears to crash (becomes unresponsive, though I can here it working). When I used the default parameters, it took about 27 hours to get 63% through the bin conversion.

I moved 2, hour long files (~20gb in total) to the local drive and concatenated them and ran the through the pipeline with these parameters:

kilosort2_5 = dict(NT= None, car= True, chunk_mb= 500, detect_threshold= 3.5, freq_min= 600, keep_good_only= False, minFR= 0.1, minfr_goodchannels= 0.1, nPCs= 3, n_jobs_bin = 1, nfilt_factor= 4, ntbuff= 64, preclust_threshold= 8, projection_threshold= [10, 2], sigmaMask= 30))

This worked, but took ~10 hours to spikesort with just kilosort2_5.

The post-processing has been running now for nearly 4 hours, and is extracting waveforms at chunk 240 out of 438.

Seeing as this is only a quarter of the data I would ideally like to put through the pipeline as 1 nwb file, have you any suggestions as to how I could speed this up? (For comparison, our colleague wrote a function that merges and converts the smrx files to bin files at ~20x the speed.)

image

image

alejoe91 commented 3 years ago

@rf13734 are you filtering the data as well? 80Gb of data would take time to process. You can also use your colleague's code to write concatenate and write to bin and load the binary data in SI directly. As discussed before, the process can be sped up using more n_jobs_bin for KS and n_jobs for postprocessing.

ross-folkard commented 3 years ago

@alejoe91 Thanks for the quick response! I will implement all the suggestions! I haven't got screenshot evidence, but the computer started to get very buggy when I increased n_jobs_bin at all (e.g. couldn't open myfolders on Windows). If it happens again I'll let you know. Thanks again!

alejoe91 commented 3 years ago

Try to use more jobs ad a smaller chunk_mb (like 250)

ross-folkard commented 3 years ago

Great, will do!

alejoe91 commented 3 years ago

BTW, it's normal that your computer would get slower/ "buggy" if you use as many jobs as the cores. You're not supposed to be using it while it's processing as all resources are allocated for this

ross-folkard commented 3 years ago

Ah that's great. I don't mind not using it, but I was struggling to see if it was stuck somewhere (it didn't seem to be close maxing out the ram or CPU capacity, but was still lots slower). I'll run it again and be a bit more patient :)

ross-folkard commented 3 years ago

@alejoe91 Would I then load the concatenated bin file using the bindatrecordingextractor?

alejoe91 commented 3 years ago

Yep!