Closed brendonw1 closed 8 months ago
what excatly is failing? vRAM? system RAM? storage capacity? i've been running 17 hr recordings without issue.
Thank you. What about a multi-week recording? Does it run in a way that is able to handle any length?
On Sun, Jun 13, 2021 at 8:01 AM WeissShahaf @.***> wrote:
what excatly is failing? vRAM? system RAM? storage capacity? i've been running 17 hr recordings without issue.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-860199440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTPRII3N7PUB5LACEJTTSSM2XANCNFSM45IEXCOQ .
haven't tried that. why not chunk it into separate days? what is the error that you get?
Thank you so much. We are trying to track the same neurons, whenever possible, over days. If we chunk it, I assume we lose that ability. Am I right there? Does the algo somehow compensate for multiple separate sortings?
Hi Brendon,
We (@TomBugnon) have successfully run KS2.5+ on 24h recordings. IIRC, it takes \~24h on a Tesla V100 32GB, but Tom may correct me on that. It is worth noting that this GPU, while very expensive (\~$9k), is far from optimal as far as Kilosort is concerned. I expect that (slightly) less expensive (\~$6k) RTX would be optimal for sorting very long recordings in one go. Unfortunately, NVidia prevents you from putting these "consumer grade" GPUs into a "commercial grade" rackmount servers, so you've got to build a roaring PC desktop and use that.
I do not know of an accepted method or tool for reconciling the outputs of two or more separate sortings, which would allow you to sort shorter recordings independently and combine the results post-hoc. I have always assumed that the longer you can sort in one shot the better, but I imagine that the marginal benefits of doing this (vs. post-hoc reconciliation of sorting outputs) decreases with recording length.
Currently, Kilosort's method of drift correction seems to perform poorly when applied to very long recordings, and @TomBugnon is working on fixing that. The recently published approach here seems very promising, but I do not think that their code is ready for use.
Hope that is helpful!
Hi @brendonw1 As far as I know there is no built-in way in kilosort to combine results post-hoc. However Allen Institute's spike sorting package contains a module to do just that here or here for Jennifer Colonell's SpikeGLX-friendly fork
It looks like these are not maintained and I haven't tried either but I would be curious to hear how it performs.
Also, I would guess that some kind of correction for day-to-day drift might be required. I reckon in the neuropixel 2.0 paper they concatenated the raw data and then used this piece of code to register both days to each other.
Some integrated package to handle this would be super useful I think...
Awesome guys! Thank you @TomBugnon and @grahamfindlay and @WeissShahaf for so much info. I will try to dig into the methods out there with or without KS. One idea is maybe we just do multiple 12 or 24 hour sessions and run our various metrics on each one separately and see how that goes. We are lucky that we are mostly looking for global population stats rather than truly single cells in our intial analyses. So we'll start with just that.
Thanks so much!
Sure! For the record, if anyone actually ends up working on tracking units across days (which will happen somewhere sometime, I guess) I'd be interested in getting in touch. Cheers
Me too!
me too! I'd love to be kept in the loop of any updates or progress people make on this issue. @TomBugnon, have you tried using that flag in the KS code? I saw it a while ago but it wasn't obvious to me how to use it. This might be a silly question but do you know what units the midpoints flag is in? The registration split is:
%register the first block [imin1, F1] = align_block(F(:, :, 1:ops.midpoint)); % register the second block as usual [imin2, F2] = align_block(F(:, :, ops.midpoint+1:end));
F is defined as:
F = zeros(dmax, 20, Nbatches);
where Nbatches is a KS default (I believe it's calculated as:) Nbatch = ceil(bytes / 2NchanTOTNT);
so in this case would I keep track of how many batches are in each file and set the midpoint as the number of batches in the first file, when I concatenate the files? Curious if you've tried this... I concatenate the files in Matlab so I definitely could set that flag after reading the first file...
edit: I tried this, and it didn't qualitatively change my (poor) aligned results.
yeah. Some other groups not using Kilosort seem to have some ideas, but I've not heard anything within Kilosort yet. It'd be nice if it were possible, but we may need to migrate in the end
Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231
On Mon, Jul 26, 2021 at 1:26 PM Amy Christensen @.***> wrote:
me too! I'd love to be kept in the loop of any updates or progress people make on this issue.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-886889334, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTO4RPGMYVL2VE6QNXDTZWLDJANCNFSM45IEXCOQ .
Hello there, Just wanted to update you on this. I have 26 days (2 or 3 hours a day) of recording over 2 months with a 32 channel electrode. I've merged them into a big binary file (270 GB). Then, I've fed it into KiloSort 2 and it worked with a little bit of playing around with the batch size. :) I'm using NVIDIA Quadro p4000 which has 8 GB of memory.
PS. I've commented splitting part in the code!
Oh thanks that's amazing! I admit it's been a while since I have thought deeply about KS. Are you saying you analyzed each chunk/batch separately? Or were all 26 days done in a way that the same spikes could be classified across all of them (if the same spikes were recorded)?... if this latter one, how did you get them to be considered "one recording" by KS2?
Thanks so much for this update
Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231
On Mon, Sep 20, 2021 at 5:03 PM alirezasaeedi1988 @.***> wrote:
Hello there, Just wanted to update you on this. I have 26 days (2 or 3 hours a day) of recording over 2 months with a 32 channel electrode. I've merged them into a big binary file (270 GB). Then, I've fed it into KiloSort 2 and it worked with a little bit of playing around with the batch size. :) I'm using NVIDIA Quadro p4000 which has 8 GB of memory.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-923306791, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTNFDUT77QHZMWZZT4LUC6OTNANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi Brandon, I think a convenient way to stitch recordings is using spikeinterface with something like this (hasty copypaste for inspiration):
import spikeinterface.extractors as se
import spikeinterface.sorters as ss
from pathlib import Path
binpaths = ...
output_dir = Path(...)
sorter = 'kilosort2_5'
rec_extractors = [
se.SpikeGLXRecordingExtractor(binpath)
for binpath in binpaths
]
if len(rec_extractors) == 1:
recording = rec_extractors[0]
else:
recording = se.MultiRecordingTimeExtractor(rec_extractors)
# Must add chan locs
recording.set_channel_locations(...)
ss.run_sorter(
sorter,
recording,
output_folder=output_dir,
verbose=True,
**params
)
# Delete useless `recording.dat` file copied by spikeinterface to ks otput dir
rec_path = output_dir/'recording.dat'
if clean_dat_file and rec_path.exists():
rec_path.unlink()
How to distinguish units that were "lost" by kilosort because of drift and should be merged from those that are lost for physiological/mecanical reasons doesn't seem like an easy-to-settle question and I'd be curious to hear thoughts on this.
Thanks so much! I guess my question with this method is whether the template matching is happening in a common computational space? Or is each chunk separately getting "clustered" (template finding) and so has different spikes?
Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231
On Fri, Sep 24, 2021 at 7:02 AM TomBugnon @.***> wrote:
Hi Brandon, I think a convenient way to stitch recordings is using spikeinterface with something like this
import spikeinterface.extractors as seimport spikeinterface.sorters as ssfrom pathlib import Path binpaths = ...output_dir = Path(...) rec_extractors = [ se.SpikeGLXRecordingExtractor(binpath) for binpath in binpaths ] if len(rec_extractors) == 1: recording = rec_extractors[0]else: recording = se.MultiRecordingTimeExtractor(rec_extractors) ss.run_sorter( sorter, rec, output_folder=output_dir, verbose=True, **params )
Delete useless
recording.dat
file copied by spikeinterface to ks otput dirrec_path = output_dir/'recording.dat'if clean_dat_file and rec_path.exists():rec_path.unlink()
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-926537822, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTK5ZOYGLNUXDLCIGDTUDRLC7ANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
With this approach Kilosort treats the concatenated data as a single dataset (and is ignorant of where the concatenation happens). This is likely not ideal (notably for the drift correction)
OK, so if I read this correctly:
Right?
Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231
On Fri, Sep 24, 2021 at 3:04 PM TomBugnon @.***> wrote:
With this approach Kilosort treats the concatenated data as a single dataset (and is ignorant of where the concatenation happens). This is likely not ideal (notably for the drift correction)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-926857147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTNFTJ7E564SLRHTWCLUDTDSZANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
no no sorry I was unclear: The data is concatenated before being passed to kilsoort. Kilsoort runs on the concatenated data as if it was a continuous dataset
Oh! Great
When we have done similar, we run into RAM issues, but probably bc we use single linear 64-site shanks that can't be broken into sub-groups of channels like tetrodes. Or has KS found a way to remove the memory limitations?
Thanks again so much
Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231
On Fri, Sep 24, 2021 at 3:34 PM TomBugnon @.***> wrote:
no no sorry I was unclear: The data is concatenated before being passed to kilsoort. Kilsoort runs on the concatenated data as if it was a continuous dataset
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-926873608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTMLWG5J5JV67DGPP23UDTHF3ANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi @brendonw1
in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size (ops.NT) I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. You can increase or decrease that parameter to overcome RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)
best, Alireza
Thank you guys very much!
Brendon Watson, MD-PhD Assistant Professor in Psychiatry Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231
On Fri, Sep 24, 2021 at 4:06 PM alirezasaeedi1988 @.***> wrote:
Hi @brendonw1 https://github.com/brendonw1
in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon https://github.com/TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-926889594, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTKV4LHEAMUPU2M2RPDUDTK2VANCNFSM45IEXCOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi, I want to know the criteria to make sure the recorded single unit belongs to the same neuron throughout the recording (>4 hours), and I believe there will be some drift during recording. It will be better if there is any reference paper to follow. Thanks very much!
Hi @brendonw1
in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size (ops.NT) I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. You can increase or decrease that parameter to overcome RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)
best, Alireza
Hey @alirezasaeedi1988, Could you please elaborate a little bit on this? I have 3 recordings without moving the probe anywhere from my animals, each taking like 35min, so after I combine the 3 sessions, I still have a .dat file under 200GB, KS2 should be able to handle this, but it breaks after "main optimization" of the spikesorting. (Breifly, I am losing it after main optimization, during re-ordering the batches with a variety of error messages eg.: Error running kilosort! EIG did not converge at index=something, or Error running kilosort! Unrecognized field name "momentum".) I would like to try commenting out the splitting part or play around with the betch size, maybe that would work for me as well, but I could not find out how can I do this. What did you do exactly to work with these concatenated files? Btw, I used python np.memmap() to merge my files, they look actually really nice in the KS2 GUI, so the file should not be corrupted. Thanks a lot!
Thanks very much We are working on alternative ideas too. Like sub-sampling Hope to get something done soon I appreciate the follow up
Brendon Watson, MD-PhD Assistant Professor in Psychiatry, Bioinformatics and Biomedical Engineering
Biomedical Sciences Research Building, Room 5059 University of Michigan 109 Zina Pitcher Place Ann Arbor, MI 48109-5720 Lab Website: http://watsonneurolab.org Clinical phone: 734-764-0231
On Thu, Aug 10, 2023 at 12:33 PM papannon @.***> wrote:
Hi @brendonw1 https://github.com/brendonw1
in the way that I am doing it, I have one single concatenated data file so for Kilosort it is a continuous dataset as @TomBugnon https://github.com/TomBugnon just said. I am also having a single shank electrode and didn't break the data into sub-groups! But the batch size (ops.NT) I mentioned in my first comment is a parameter in the Kilosort config file to handle the RAM problems. You can increase or decrease that parameter to overcome RAM problems. with a concatenated data from different days, you can track the same neurons over those days (if the drift is not too much is easier for Kilosort to keep the track but even if the drift is moderate you can still merge some cluster in manual sorting stage using PHY)
best, Alireza
Hey @alirezasaeedi1988 https://github.com/alirezasaeedi1988, Could you please elaborate a little bit on this? I have 3 recordings without moving the probe anywhere from my animals, each taking like 35min, so after I combine the 3 sessions, I still have a .dat file under 200GB, KS2 should be able to handle this, but it breaks after "main optimization" of the spikesorting. (Breifly, I am losing it after main optimization, during re-ordering the batches with a variety of error messages eg.: Error running kilosort! EIG did not converge at index=something, or Error running kilosort! Unrecognized field name "momentum".) I would like to try commenting out the splitting part or play around with the betch size, maybe that would work for me as well, but I could not find out how can I do this. What did you do exactly to work with these concatenated files? Btw, I used python np.memmap() to merge my files, they look actually really nice in the KS2 GUI, so the file should not be corrupted. Thanks a lot!
— Reply to this email directly, view it on GitHub https://github.com/MouseLand/Kilosort/issues/405#issuecomment-1673549629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA26WTJY7IOIYUBDFLYS7DLXUUENFANCNFSM45IEXCOQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi @StringXD
While Kilosort identifies and clusters waveforms across different days, it doesn't necessarily confirm that this always reflects reality. (I got a similar comment for my paper)
Several prior studies have made similar assertions, including Tolias et al. in JNeurophysiol 2007, McMahon et al. in PNAS 2014, Okun et al. in PLoS ONE 2016, Schoonover et al. in Nature 2021, and Steinmetz et al. in Science 2021. These studies have made considerable effort to confirm that the same neurons are being tracked using various methods.
In summary, after spike sorting, you can extract spikes from different time-window (which can vary depending on the experiment) and compare the waveform properties of each cluster with all the clusters (including itself) in the next time window by defining some similarity indices. If the neuron has been tracked correctly, it should best match itself in successive time windows.
Hey @alirezasaeedi1988, Could you please elaborate a little bit on this? I have 3 recordings without moving the probe anywhere from my animals, each taking like 35min, so after I combine the 3 sessions, I still have a .dat file under 200GB, KS2 should be able to handle this, but it breaks after "main optimization" of the spikesorting. (Breifly, I am losing it after main optimization, during re-ordering the batches with a variety of error messages eg.: Error running kilosort! EIG did not converge at index=something, or Error running kilosort! Unrecognized field name "momentum".) I would like to try commenting out the splitting part or play around with the betch size, maybe that would work for me as well, but I could not find out how can I do this. What did you do exactly to work with these concatenated files? Btw, I used python np.memmap() to merge my files, they look actually really nice in the KS2 GUI, so the file should not be corrupted. Thanks a lot!
Hi @papannon
I am not familiar with np.memmap(), but if the concatenation is properly done, the rest should not be very challenging (considering size of your .dat file). Have you tried to change ops.NT in the config file? What GPU do you have?
We would like to do multi-day recordings, but these recordings mean that we exceed available memory. I am wondering if someone can explain how we might run KS on such long recordings to still extract spikes. In 2019, in issue 135 (https://github.com/MouseLand/Kilosort/issues/135), we raised this. However, now I see in run_Templates.m that there is a new approach taken:
% update: these recommendations no longer apply for the datashift version!
Can someone explain whether and how we can run spike detection on a long-duration recordings (previously with the idea of detecting on a sub-set of data and then finding in the full data)?
Thanks so much Brendon Watson