How to run KS (any version) to extract data from very long recordings

brendonw1 commented 3 years ago

We would like to do multi-day recordings, but these recordings mean that we exceed available memory. I am wondering if someone can explain how we might run KS on such long recordings to still extract spikes. In 2019, in issue 135 (https://github.com/MouseLand/Kilosort/issues/135), we raised this. However, now I see in run_Templates.m that there is a new approach taken:

% update: these recommendations no longer apply for the datashift version!

Can someone explain whether and how we can run spike detection on a long-duration recordings (previously with the idea of detecting on a sub-set of data and then finding in the full data)?

Thanks so much Brendon Watson

WeissShahaf commented 3 years ago

what excatly is failing? vRAM? system RAM? storage capacity? i've been running 17 hr recordings without issue.

brendonw1 commented 3 years ago

Thank you. What about a multi-week recording? Does it run in a way that is able to handle any length?

On Sun, Jun 13, 2021 at 8:01 AM WeissShahaf @.***> wrote:

what excatly is failing? vRAM? system RAM? storage capacity? i've been running 17 hr recordings without issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-860199440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTPRII3N7PUB5LACEJTTSSM2XANCNFSM45IEXCOQ .

WeissShahaf commented 3 years ago

haven't tried that. why not chunk it into separate days? what is the error that you get?

brendonw1 commented 3 years ago

Thank you so much. We are trying to track the same neurons, whenever possible, over days. If we chunk it, I assume we lose that ability. Am I right there? Does the algo somehow compensate for multiple separate sortings?

grahamfindlay commented 3 years ago

Hi Brendon,

We (@TomBugnon) have successfully run KS2.5+ on 24h recordings. IIRC, it takes \~24h on a Tesla V100 32GB, but Tom may correct me on that. It is worth noting that this GPU, while very expensive (\~$9k), is far from optimal as far as Kilosort is concerned. I expect that (slightly) less expensive (\~$6k) RTX would be optimal for sorting very long recordings in one go. Unfortunately, NVidia prevents you from putting these "consumer grade" GPUs into a "commercial grade" rackmount servers, so you've got to build a roaring PC desktop and use that.

I do not know of an accepted method or tool for reconciling the outputs of two or more separate sortings, which would allow you to sort shorter recordings independently and combine the results post-hoc. I have always assumed that the longer you can sort in one shot the better, but I imagine that the marginal benefits of doing this (vs. post-hoc reconciliation of sorting outputs) decreases with recording length.

Currently, Kilosort's method of drift correction seems to perform poorly when applied to very long recordings, and @TomBugnon is working on fixing that. The recently published approach here seems very promising, but I do not think that their code is ready for use.

Hope that is helpful!

TomBugnon commented 3 years ago

Hi @brendonw1 As far as I know there is no built-in way in kilosort to combine results post-hoc. However Allen Institute's spike sorting package contains a module to do just that here or here for Jennifer Colonell's SpikeGLX-friendly fork

It looks like these are not maintained and I haven't tried either but I would be curious to hear how it performs.

Also, I would guess that some kind of correction for day-to-day drift might be required. I reckon in the neuropixel 2.0 paper they concatenated the raw data and then used this piece of code to register both days to each other.

Some integrated package to handle this would be super useful I think...

brendonw1 commented 3 years ago

Awesome guys! Thank you @TomBugnon and @grahamfindlay and @WeissShahaf for so much info. I will try to dig into the methods out there with or without KS. One idea is maybe we just do multiple 12 or 24 hour sessions and run our various metrics on each one separately and see how that goes. We are lucky that we are mostly looking for global population stats rather than truly single cells in our intial analyses. So we'll start with just that.

Thanks so much!

TomBugnon commented 3 years ago

Sure! For the record, if anyone actually ends up working on tracking units across days (which will happen somewhere sometime, I guess) I'd be interested in getting in touch. Cheers

brendonw1 commented 3 years ago

Me too!

achristensen56 commented 3 years ago

me too! I'd love to be kept in the loop of any updates or progress people make on this issue. @TomBugnon, have you tried using that flag in the KS code? I saw it a while ago but it wasn't obvious to me how to use it. This might be a silly question but do you know what units the midpoints flag is in? The registration split is:

%register the first block [imin1, F1] = align_block(F(:, :, 1:ops.midpoint)); % register the second block as usual [imin2, F2] = align_block(F(:, :, ops.midpoint+1:end));

F is defined as:

F = zeros(dmax, 20, Nbatches);

where Nbatches is a KS default (I believe it's calculated as:) Nbatch = ceil(bytes / 2NchanTOTNT);

so in this case would I keep track of how many batches are in each file and set the midpoint as the number of batches in the first file, when I concatenate the files? Curious if you've tried this... I concatenate the files in Matlab so I definitely could set that flag after reading the first file...

edit: I tried this, and it didn't qualitatively change my (poor) aligned results.

brendonw1 commented 3 years ago

yeah. Some other groups not using Kilosort seem to have some ideas, but I've not heard anything within Kilosort yet. It'd be nice if it were possible, but we may need to migrate in the end

Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231

On Mon, Jul 26, 2021 at 1:26 PM Amy Christensen @.***> wrote:

me too! I'd love to be kept in the loop of any updates or progress people make on this issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-886889334, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTO4RPGMYVL2VE6QNXDTZWLDJANCNFSM45IEXCOQ .

alirezasaeedi1988 commented 3 years ago

Hello there, Just wanted to update you on this. I have 26 days (2 or 3 hours a day) of recording over 2 months with a 32 channel electrode. I've merged them into a big binary file (270 GB). Then, I've fed it into KiloSort 2 and it worked with a little bit of playing around with the batch size. :) I'm using NVIDIA Quadro p4000 which has 8 GB of memory.

PS. I've commented splitting part in the code!

brendonw1 commented 3 years ago

Oh thanks that's amazing! I admit it's been a while since I have thought deeply about KS. Are you saying you analyzed each chunk/batch separately? Or were all 26 days done in a way that the same spikes could be classified across all of them (if the same spikes were recorded)?... if this latter one, how did you get them to be considered "one recording" by KS2?

Thanks so much for this update

Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231

On Mon, Sep 20, 2021 at 5:03 PM alirezasaeedi1988 @.***> wrote:

Hello there, Just wanted to update you on this. I have 26 days (2 or 3 hours a day) of recording over 2 months with a 32 channel electrode. I've merged them into a big binary file (270 GB). Then, I've fed it into KiloSort 2 and it worked with a little bit of playing around with the batch size. :) I'm using NVIDIA Quadro p4000 which has 8 GB of memory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-923306791, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTNFDUT77QHZMWZZT4LUC6OTNANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

TomBugnon commented 3 years ago

Hi Brandon, I think a convenient way to stitch recordings is using spikeinterface with something like this (hasty copypaste for inspiration):

import spikeinterface.extractors as se
import spikeinterface.sorters as ss
from pathlib import Path

binpaths = ...
output_dir = Path(...)
sorter = 'kilosort2_5'

rec_extractors = [
        se.SpikeGLXRecordingExtractor(binpath)
        for binpath in binpaths
    ]

if len(rec_extractors) == 1:
    recording = rec_extractors[0]
else:    
    recording = se.MultiRecordingTimeExtractor(rec_extractors)

# Must add chan locs 
recording.set_channel_locations(...)

ss.run_sorter(
    sorter,
    recording,
    output_folder=output_dir,
    verbose=True,
    **params
)

# Delete useless `recording.dat` file copied by spikeinterface to ks otput dir
rec_path = output_dir/'recording.dat'
if clean_dat_file and rec_path.exists():
    rec_path.unlink()

How to distinguish units that were "lost" by kilosort because of drift and should be merged from those that are lost for physiological/mecanical reasons doesn't seem like an easy-to-settle question and I'd be curious to hear thoughts on this.

brendonw1 commented 3 years ago

Thanks so much! I guess my question with this method is whether the template matching is happening in a common computational space? Or is each chunk separately getting "clustered" (template finding) and so has different spikes?

Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231

On Fri, Sep 24, 2021 at 7:02 AM TomBugnon @.***> wrote:

Hi Brandon, I think a convenient way to stitch recordings is using spikeinterface with something like this

import spikeinterface.extractors as seimport spikeinterface.sorters as ssfrom pathlib import Path binpaths = ...output_dir = Path(...) rec_extractors = [ se.SpikeGLXRecordingExtractor(binpath) for binpath in binpaths ] if len(rec_extractors) == 1: recording = rec_extractors[0]else: recording = se.MultiRecordingTimeExtractor(rec_extractors) ss.run_sorter( sorter, rec, output_folder=output_dir, verbose=True, **params )

Delete useless recording.dat file copied by spikeinterface to ks otput dirrec_path = output_dir/'recording.dat'if clean_dat_file and rec_path.exists():
rec_path.unlink()
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-926537822, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTK5ZOYGLNUXDLCIGDTUDRLC7ANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

TomBugnon commented 3 years ago

With this approach Kilosort treats the concatenated data as a single dataset (and is ignorant of where the concatenation happens). This is likely not ideal (notably for the drift correction)

brendonw1 commented 3 years ago

OK, so if I read this correctly:

The data is split in chunks, the spike sorting is done separately for each chunk
The sorted spikes are not in any way "matched' after the joining, they are just passively put together.
So we can talk about population stats like "total firing rate" or EI ratio, etc, but not single-cell properties over the entire span.

Right?

Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231

On Fri, Sep 24, 2021 at 3:04 PM TomBugnon @.***> wrote:

With this approach Kilosort treats the concatenated data as a single dataset (and is ignorant of where the concatenation happens). This is likely not ideal (notably for the drift correction)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-926857147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTNFTJ7E564SLRHTWCLUDTDSZANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

TomBugnon commented 3 years ago

no no sorry I was unclear: The data is concatenated before being passed to kilsoort. Kilsoort runs on the concatenated data as if it was a continuous dataset

brendonw1 commented 3 years ago

Oh! Great

When we have done similar, we run into RAM issues, but probably bc we use single linear 64-site shanks that can't be broken into sub-groups of channels like tetrodes. Or has KS found a way to remove the memory limitations?

Thanks again so much

Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231

On Fri, Sep 24, 2021 at 3:34 PM TomBugnon @.***> wrote:

no no sorry I was unclear: The data is concatenated before being passed to kilsoort. Kilsoort runs on the concatenated data as if it was a continuous dataset

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-926873608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTMLWG5J5JV67DGPP23UDTHF3ANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

alirezasaeedi1988 commented 3 years ago

Hi @brendonw1

in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size (ops.NT) I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. You can increase or decrease that parameter to overcome RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)

best, Alireza

brendonw1 commented 3 years ago

Thank you guys very much!

Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231

On Fri, Sep 24, 2021 at 4:06 PM alirezasaeedi1988 @.***> wrote:

Hi @brendonw1 https://github.com/brendonw1

in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon https://github.com/TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-926889594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTKV4LHEAMUPU2M2RPDUDTK2VANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

StringXD commented 1 year ago

Hi, I want to know the criteria to make sure the recorded single unit belongs to the same neuron throughout the recording (>4 hours), and I believe there will be some drift during recording. It will be better if there is any reference paper to follow. Thanks very much!

papannon commented 1 year ago

Hi @brendonw1

in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size (ops.NT) I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. You can increase or decrease that parameter to overcome RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)

best, Alireza

Hey @alirezasaeedi1988, Could you please elaborate a little bit on this? I have 3 recordings without moving the probe anywhere from my animals, each taking like 35min, so after I combine the 3 sessions, I still have a .dat file under 200GB, KS2 should be able to handle this, but it breaks after "main optimization" of the spikesorting. (Breifly, I am losing it after main optimization, during re-ordering the batches with a variety of error messages eg.: Error running kilosort! EIG did not converge at index=something, or Error running kilosort! Unrecognized field name "momentum".) I would like to try commenting out the splitting part or play around with the betch size, maybe that would work for me as well, but I could not find out how can I do this. What did you do exactly to work with these concatenated files? Btw, I used python np.memmap() to merge my files, they look actually really nice in the KS2 GUI, so the file should not be corrupted. Thanks a lot!

brendonw1 commented 1 year ago

Thanks very much We are working on alternative ideas too. Like sub-sampling Hope to get something done soon I appreciate the follow up

Brendon Watson, MD-PhD Assistant Professor in Psychiatry, Bioinformatics and Biomedical Engineering

Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231

On Thu, Aug 10, 2023 at 12:33 PM papannon @.***> wrote:

Hi @brendonw1 https://github.com/brendonw1

in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon https://github.com/TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size (ops.NT) I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. You can increase or decrease that parameter to overcome RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)

best, Alireza

Hey @alirezasaeedi1988 https://github.com/alirezasaeedi1988, Could you please elaborate a little bit on this? I have 3 recordings without moving the probe anywhere from my animals, each taking like 35min, so after I combine the 3 sessions, I still have a .dat file under 200GB, KS2 should be able to handle this, but it breaks after "main optimization" of the spikesorting. (Breifly, I am losing it after main optimization, during re-ordering the batches with a variety of error messages eg.: Error running kilosort! EIG did not converge at index=something, or Error running kilosort! Unrecognized field name "momentum".) I would like to try commenting out the splitting part or play around with the betch size, maybe that would work for me as well, but I could not find out how can I do this. What did you do exactly to work with these concatenated files? Btw, I used python np.memmap() to merge my files, they look actually really nice in the KS2 GUI, so the file should not be corrupted. Thanks a lot!

— Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-1673549629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTJY7IOIYUBDFLYS7DLXUUENFANCNFSM45IEXCOQ . You are receiving this because you were mentioned.Message ID: @.***>

alirezasaeedi1988 commented 1 year ago

Hi @StringXD

While Kilosort identifies and clusters waveforms across different days, it doesn't necessarily confirm that this always reflects reality. (I got a similar comment for my paper)

Several prior studies have made similar assertions, including Tolias et al. in JNeurophysiol 2007, McMahon et al. in PNAS 2014, Okun et al. in PLoS ONE 2016, Schoonover et al. in Nature 2021, and Steinmetz et al. in Science 2021. These studies have made considerable effort to confirm that the same neurons are being tracked using various methods.

In summary, after spike sorting, you can extract spikes from different time-window (which can vary depending on the experiment) and compare the waveform properties of each cluster with all the clusters (including itself) in the next time window by defining some similarity indices. If the neuron has been tracked correctly, it should best match itself in successive time windows.

alirezasaeedi1988 commented 1 year ago

Hey @alirezasaeedi1988, Could you please elaborate a little bit on this? I have 3 recordings without moving the probe anywhere from my animals, each taking like 35min, so after I combine the 3 sessions, I still have a .dat file under 200GB, KS2 should be able to handle this, but it breaks after "main optimization" of the spikesorting. (Breifly, I am losing it after main optimization, during re-ordering the batches with a variety of error messages eg.: Error running kilosort! EIG did not converge at index=something, or Error running kilosort! Unrecognized field name "momentum".) I would like to try commenting out the splitting part or play around with the betch size, maybe that would work for me as well, but I could not find out how can I do this. What did you do exactly to work with these concatenated files? Btw, I used python np.memmap() to merge my files, they look actually really nice in the KS2 GUI, so the file should not be corrupted. Thanks a lot!

Hi @papannon

I am not familiar with np.memmap(), but if the concatenation is properly done, the rest should not be very challenging (considering size of your .dat file). Have you tried to change ops.NT in the config file? What GPU do you have?

MouseLand / Kilosort

How to run KS (any version) to extract data from very long recordings #405

Delete useless `recording.dat` file copied by spikeinterface to ks otput dirrec_path = output_dir/'recording.dat'if clean_dat_file and rec_path.exists():

MouseLand / Kilosort

How to run KS (any version) to extract data from very long recordings #405

Delete useless recording.dat file copied by spikeinterface to ks otput dirrec_path = output_dir/'recording.dat'if clean_dat_file and rec_path.exists():

Delete useless `recording.dat` file copied by spikeinterface to ks otput dirrec_path = output_dir/'recording.dat'if clean_dat_file and rec_path.exists():