cortex-lab / phy

phy: interactive visualization and manual spike sorting of large-scale ephys data
BSD 3-Clause "New" or "Revised" License
304 stars 155 forks source link

Loading multiple sessions to spike-sort together #1266

Closed elduvelle closed 1 month ago

elduvelle commented 3 months ago

Hi, Say I have sorted 2 (or more) sessions together with Mountainsort / Kilosort and then saved them back in individual files. Is it possible to load and manually curate these sessions together with phy template gui? If this is possible, how is the output saved? Is there any documentation on this?

Thanks for any help!

(if there is a better place to ask this don't hesitate to let me know!)

zm711 commented 2 months ago

Phy doesn't work with mountainsort. Are you using spikeinterface and exporting? Just trying to understand your question a bit better :)

elduvelle commented 2 months ago

Hi @zm711, yes I am using spikeinterface to run Mountainsort and then export the result to Phy-formatted files. I then run Phy on those files. It works fine with single sessions but I'm not sure what's the best way to proceed with combined sessions, hopefully you'll have some insight about my question!

zm711 commented 2 months ago

How are you combining them? I probably need to see the script you're running for spikeinterface to know the set of files you're hoping to analyze. Combined like using the comparison module or combined in another way?

elduvelle commented 2 months ago

I'm combining the 'recordings' prior to running mountainsort with this instruction: multirecording = si.concatenate_recordings(recordings_list) (explained here, Case 2).

After running mountainsort, I export to Phy using extract_waveforms (from spikeinterface) and then export_to_phy (from spikeinterface.exporters). For now since I don't know how to load multiple files in Phy I'm just loading the concatenated file, but then I'm not sure how to separate it back into the original sessions.

zm711 commented 2 months ago

Just a couple more clarifying questions!

So I see what you're doing now. Something like

recording_1 = se.read_xx()
recording_2 = se.read_xx()
multirecording = si.concatenate_recordings([recording_1, recording_2])
rec = multirecording.set_probe(probe)
sorting = ss.run_sorter('mountainsort5', rec, output_folder = '.')
rec_f = spre.bandpass_filter(rec)
rec_cmr = spre.common_reference(rec)
we = si.extract_waveforms(recording=rec_cmr, sorting=sorting)
sexp.export_to_phy(we, copy_binary=True)

I'm assuming you're copying one gobal binary? Or are you not making the binary file. Because file would need the concatenated binary file to work correctly for something like this. So that is question one: do you have a recording.dat that you are using.

Question 2 is why do you want to do multiple files in Phy vs just analyze the overall dataset? I can help more if I know the specific goal of doing it this way.

elduvelle commented 2 months ago

Hmm I'm not sure what the "binary" is? Is there any documentation on this? I'm using a slightly older version of spike interface (0.93) so maybe some things will be different.

This is what my last 3 instructions look like (they can be run on a single recording or on a concatenated one in the same way): (I don't know how to do the nice formatting like you did)

# imports
import spikeinterface as si
import spikeinterface.sorters as ss
from spikeinterface.exporters import export_to_phy
...

this_output_folder = os.path.join(directory_output, 'results_MS')
sorting_MS = ss.run_mountainsort4(recording,
                                      output_folder = this_output_folder,
                                      verbose = True, **ms_params,)

this_output_folder = os.path.join(directory_output, 'wf_MS')
we_all = si.extract_waveforms(recording, sorting_MS, folder = this_output_folder, 
                                      max_spikes_per_unit = None, progress_bar = True)

this_output_folder = os.path.join(directory_output,'phy_MS')
export_to_phy(we_all, output_folder = this_output_folder,
                      progress_bar = True, total_memory = '500M')

To answer your questions: Question 1: I am not sure about the recordings.dat. There is one in my "Phy_MS" output folder but I don't know if it's used by Phy. Is that the binary? How do I know if Phy is using it? Ideally, I would like to not create it, or be able to get rid of it after creation as it's very big (but that's another question).

Question 2: I want to spike-sort all the sessions from a sequence of sessions together, but the question is whether I should do 1) group > mountainsort > phy > then somehow ungroup (that's what I'm currently doing, except the last step) or if should do 2) group > mountainsort > ungroup > phy. I don't have a preference except that I don't know how to ungroup the individual sessions once they've been run through phy, so it would be more convenient if I could run phy on the ungrouped sessions (option 2).

Generally, do you know where I can find more information about the files that are necessary for Phy to run, and what output files does it generate? Thanks!

zm711 commented 2 months ago

So first a "binary" is just a machine readable file. If you tried to open it, it would just look like a bunch of non-sense. It is where the raw data is stored. Phy requires a raw data binary file (most commonly the ending would be .dat, .bin, .raw) for full functionality. So export_to_phy should supply one and it is our convention to call it recording.dat.

For your work flow it would probably make sense to do group>ms4>phy>ungroup. But I'm not sure why you want to ungroup though? Is there a concern about having the data all together? It definitely makes sense to sort the data together if your probe hasn't been moved and thus you likely have the same neurons. But if you want to split your data later it is totally possible in spikeinterface :)

you would do:

sorting = se.read_phy('path-to-phy-folder')
# these functions are in frames/samples so we need to convert from time to samples
sorting1 = sorting.frame_slice(0, end_of_first_recording * sorting.sample_frequency)
sorting2 = sorting.frame_slice(end_of_first_recording * sorting.sampling_frequency, sorting.get_num_samples())

Is there a reason why you don't want to update spikeinterface? We are at version 0.100.5 (plenty of bug fixes, speedups, new features).

elduvelle commented 2 months ago

Hi @zm711 , that is exactly what I needed! Thanks so much!

I would certainly like to update my spikeinterface, actually I tried it some time ago but apparently the code I am using relies on some older functions and crashed under the new version, and unfortunately I don't have time to try and fix that at the time. Maybe I'll give it another go. It's a great initiative!

I believe we can close this issue now - thanks again!