SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
521 stars 186 forks source link

A thread to discuss inter-session alignment. #2626

Open JoeZiminski opened 7 months ago

JoeZiminski commented 7 months ago

Following a discussion on the Slack channel about inter-session alignment I thought I'd start an issue to summarise / discuss these things and come up with a plan on how this could be handled in Spikeinterface.

There are a couple of (not necessarily mutually exclusive) approaches:

1) post-hoc unit-matching This approach matches units from independently sorted sessions. Current approaches includes Unitmatch. SI and Unitmatch teams are working together to ensure compatibility such that SI-sorted data can be easily read in Unitmatch. There is also the approach outlined in this paper which looks interesting. Typically the motion correction in these methods relies on putative matches between units across sessions, which may fail in the presence of large drift. Therefore alignment of either the traces (pre-sorting) or templates (post-sorting) across sessions, would compliment these methods.

2) Concatenating sessions and motion correcting Another approach is to concatenate two sessions together and perform motion correction as if they were a single session, then sort. A concern is how current intra-session methods will perform in the presence of significant drift between sessions (manifesting as a sudden, large 'jump' in the middle of the concatenated recording). In the NP2 paper the Kilosort team suggest this approach does not work so well with the iterative template method*. In the Dredge paper this approach appears to be used successfully. It would be interesting to see at what level of inter-session drift (linear and nonlinear) this approach fails for the available methods. I guess the machinery for such tests has already been largely developed here.

3) Alignment of raw data pre-sorting This approach is described in the NP2 paper, roughly it goes like 1) within-session motion correction for two sessions 2) compute drift between session and add compensation to within-session offsets 3) interpolate. In theory it should also be quite possible to do with the existing correct_motion machinery, just requiring some additional logical to incorporate inter-session registration offsets prior to interpolation.

4) Alignment of template waveforms post-sorting This approach is related to post-hoc unit matching (1) specifically. For these methods, instead of interpolating the raw data based on estimated inter-session drift prior to sorting, only the templates are shifted prior to unit-matching. (@alejoe91 I think this is the idea you had at the Unitmatch meeting, if I understood correctly). This should have the same effect as (3) for unit-matching but would be much faster.

In terms of action-points, to my knowledge (1) is actively being worked on but (3) and (4), as well as validation discussed in (2) are not. Therefore supporting these inter-session approaches in Spikeinterface might be a nice idea if others think they have utility.

* Kilosort team suggest this approach does not work well

Chronic recordings
For chronic recordings, we tried to concatenate the raw data files and run the Datashift algorithm as
described above. The results were not always satisfactory, especially when the recording sessions were
separated by weeks.

from here.

zm711 commented 7 months ago

I'm not doing this myself, but others in my group are so I really appreciate you starting this thread! I think these days there are plenty of people doing multiple recordings that would benefit from machinery in spikeinterface. I'll definitely be lurking on this thread :)

samuelgarcia commented 7 months ago

And this could be a good project for the next hackthon in Edinburgh

florian6973 commented 4 months ago

Very interesting thread!

I think https://github.com/SpikeInterface/spikeinterface/issues/2911 could be related to 1) if we try to compute the best unit matching possible across all sessions based on some kind of similarity metrics.

JoeZiminski commented 4 months ago

Thanks everyone, I was thinking to get started on this soon. Please see below for a plan would be great to hear your thoughts.

1) Estimate Alignment Across Sessions

The API here is inspired by the existing within-session si.correct_motion(). This function would take an ordered list of recordings as input. Almost always they would already be motion corrected within-session. The function will estimate the vertical displacement across sessions using existing methods. In the first instance this can be fairly straightforward - use some random chunks to estimate the activity histogram for each session, and optimise to shift them into a middle-position. The focus would be to shift with minimum loss of probe-edge data and keep similar interpolation distances for all sessions. In future this could incorporate LFP band also.

This function would output a list of corrected motion recordings, using the existing InterpolateMotionRecording. Whats super nice about the existing SpikeInterface API is that the motion correction is lazy. So if motion correction is already performed, all that needs to be done is for the estimated inter-session drift correction to be added to existing drift correction in the motion-correction object.

The inter-session correct displacement function can also output a motion_info object similar to si.correct_motion(). This can then be used to correct just the peaks, or directly the templates (next step). @samuelgarcia @cwindolf it would be great to get your feedback on this.

2) Motion correct templates before unit matching

It may be that for some cases interpolating the raw data is too heavy. Instead Alessio suggested interpolating only the templates before unit matching. Using the output motion_info as above, this could be passed to a Sorting (?) object function that shifts the peak and template locations only based on the inter-session motion. This should help a lot when unit-matching based on templates when there is a lot of inter-session displacement.

3) Unit matching

There is already some template-based unit matching in SpikeInterface and we have been working with UnitMatch who have kindly implemented a SpikeInterface data loader. So we could possibly merge these into some kind of InterSessionMatching object to perform the matching while exposing different backends.

Would be great to get feedback on this!

Some notes on vocabulary (related @zm711 to our previous discussion) here 'drift' and 'motion' do not really feel like appropriate terms for inter-session changes in the probe position. I was thinking to always stick to 'displacement' in the context of inter-session displacement and reserve 'drift' and 'motion specifically for within-session alignment problems.

florian6973 commented 3 months ago

Sounds like a great plan! I am less familiar with motion correction but definitely glad to help for the spikeinterface template-based unit matching.

I would strongly suggest that alignment be an option so that we can 'skip' it when it is not needed. Ideally I would also like to be able to convert the Matching object to a Sorting one to postprocess it as usual with an (extended?) sorting (matching?) analyzer. And we should definitely implement some quality metrics on how good the unit matching is, I do not know if you have already some in mind.

samuelgarcia commented 3 months ago

Hi @JoeZiminski. This a great project. One main blocking issue : we do not support multi segment estimate_motion() yet. This is a long term plan for Charlie and I but lot of work is needed for each method (decentralized, iterative_template, dredge_ap and dredge_lfp). The good news is that this PR #3062 already enable the multi segment in the Motion object but not yet in the estimation unfortunatly...

JoeZiminski commented 3 months ago

Thanks @samuelgarcia, that's okay, cool to hear about the future plans for including segment! For now each recording could just be handled as a separate session, and in future this extended to the multi-segment case along with the motion estimation?

JoeZiminski commented 2 months ago

Another approach of interest here

JoeZiminski commented 2 weeks ago

relevant: here