MouseLand / Kilosort

Fast spike sorting with drift correction
https://kilosort.readthedocs.io/en/latest/
GNU General Public License v3.0
478 stars 248 forks source link

Kilosort data storage and GPU usage advice #389

Closed saumilpatel closed 8 months ago

saumilpatel commented 3 years ago

Dear Marius,

We, in Andreas Tolias Lab at Baylor College of Medicine in Houston Texas, are planning to do close to 2 hours worth of recordings using neuropixels in mouse. Based on some calculations this would amount to close to 170 GB recording. We plan to use Kilosort 2 for spike sorting and are wondering what would your recommendations be regarding the storage of this data. Should we store in one file or multiple files ? How does Kilosort deal with large recordings, does it analyze in smaller time bins ? What GPU would you recommend ? And can we take advantage of multiple GPUs, we have tons of GPU servers so if we can parallelize the sorting we have the hardware for it. Also what kind of compute machines would you recommend, specially memory ?

Is there anything else you would recommend ?

We plan to record using our Labview software and store data in binary file/files so that as soon as they are closed, Kilosort can start working on them.

We appreciate any comments/feedback that you would have.

Thanks, Best, Saumil

marius10p commented 3 years ago

Hi there, probably the most important thing would be to use Kilosort 2.5 instead of 2.0, especially for recordings like yours that are longer than one hour. See the recent Neuropixels 2.0 paper for a lot of detail about how 2.5 works.

The wiki has a hardware guide. Kilosort requires a single concatenated binary file, which it then processes into another temporary, high-pass filtered and whitened binary file. Ideally both the raw data and the binary file are on an SSD or a fast network connection. The processing is done in batches, so that's not a problem, but there are some quantities that accumulate over batches before being written to results and in Kilosort2 this sometimes lead to memory problems on machines with less RAM. I think you should be fine with 32 or 64 GB, not sure. There is no parallelization at the level of individual recordings, because the optimization process is sequential and there is barely enough work to keep one GPU busy.