SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
522 stars 186 forks source link

unit locations - extreme outliers #3322

Open magland opened 2 months ago

magland commented 2 months ago

Obtaining unit locations via sorting analyzer like this:

analyzer = si.create_sorting_analyzer(
    sorting, recording=recording_binary
)

# ...

# estimated unit locations
unit_locations = analyzer.get_extension("unit_locations").get_data()

Running into an issue where some of the unit locations are appearing way outside the probe. See screenshot. Any idea why this is happening and how it could be remedied?

image

The average waveform for the top outlier does not appear to be especially weird:

image

alejoe91 commented 2 months ago

IMO it is especially weird and for sure noisy! The problem is that the method uses a monopolar triangulation, which is expected to smoothly decay in space. For the unit you show this is clearly not the case, probably causing to method to not converge

magland commented 2 months ago

IMO it is especially weird and for sure noisy!

Really? I stretched the amplitude quite a bit. It is unit 43 in this.

The problem is that the method uses a monopolar triangulation, which is expected to smoothly decay in space. For the unit you show this is clearly not the case, probably causing to method to not converge.

Could this be due to the sparsity mask used when computing the average waveforms by the sorting analyzer? It seems that only about 9 channels are non-zero.

I wonder if the monopolar triangulation could be improved to not have this problem. Otherwise I think a center of mass approach would be more robust.

alejoe91 commented 2 months ago

Sure COM would be bound within the probe boundary but it's less accurate.

To estimate location, we just use the ptp on a subset of channels. The distribution of such peaks in space will have a sharp non linearity hard to model.

What about handling outliers at visualization time?

magland commented 2 months ago

I'd rather not do this at visualization time.

How can I switch to COM method in the above code? I found some examples in the SI source, but not with analyzer.get_extension("unit_locations").get_data()

alejoe91 commented 2 months ago

When it comes to analyzer.compute(..., extension_params=dict(unit_locations=dict(method="center_of_mass")))

I strongly believe that monopolar is better and one can deal with outliers by zooming in and out

magland commented 2 months ago

I strongly believe that monopolar is better and one can deal with outliers by zooming in and out

Okay, let me try to isolate the issue here and maybe the monopolar algorithm can be improved. I don't see any reason for there to be such severe outliers.

samuelgarcia commented 2 months ago

Salut Jeremy. Did you try method=grid_convolution on this ?

magland commented 2 months ago

@samuelgarcia I went ahead and tried grid_convolution. It solved the outliers problem, but a bunch of units got set to location (0, 0, 0)

zm711 commented 2 months ago

At @magland based on the slack comment recently I'm wondering if you could try plotting your spikes as well so we can see if some of the spikes are showing up as outliers.

magland commented 2 months ago

At @magland based on the slack comment recently I'm wondering if you could try plotting your spikes as well so we can see if some of the spikes are showing up as outliers.

Do you mean waveforms of individual events? That's a bit difficult for me to get at...

zm711 commented 2 months ago

It was just plotting a scatter of spike_locations using the monopolar triangulation. You don't have access to the recording any more?

magland commented 2 months ago

It was just plotting a scatter of spike_locations using the monopolar triangulation. You don't have access to the recording any more?

The recording is on DANDI, but I don't have the software pipeline set up to retrieve this easily.

samuelgarcia commented 2 months ago

my guess is that outliers are spikes with a strong collision with others. maybe you can make simple rules to detect which spikes are outliers with threhold and re-extract there waveforms using a very small sorting and a recording in stream mode. So you could check what is happening.

magland commented 2 months ago

These outliers are not individual spikes -- they are average waveforms for units.

zm711 commented 2 months ago

But for me, I'm wondering if some outlier spikes overall are feeding into the generation of your average waveforms which is then bleeding into the unit locations. So I was hoping to see where the outliers are appearing (at spikes or at the average waveform step).

samuelgarcia commented 2 months ago

ok sorry

cwindolf commented 1 week ago

The outlying locations have bothered me too -- a lot! I'll talk about a method for dealing with it in individual spike waveforms, possibly at the next meeting, but it involves a neural net and may not be practical for this use case.

My interpretation of the issue is that some waveforms have monopole cost functions which are hard to optimize for whatever reason -- maybe they have flat or otherwise non-point-sourcey amplitude profiles, which totally happens for templates which are not "collided". I had thought about adding regularization to the cost function (for instance, coef * dist(source, detection channel)^2 or coef * dist(source, CoM)^2). For some value of the regularization coefficient, this would hopefully resolve some of the outliers while not changing the well-identified positions much?

zm711 commented 1 week ago

Thanks @cwindolf for bringing this back up. I just had this happen yesterday with crazy outlier points. Our group currently has the most success sorting with Mountainsort5 (thanks @magland--though we haven't extensively tested KS4 yet) and I had maybe 5/6 clusters of artifacts (completely sinusoidal shape or sawtooth in the context of opto stimulation so not unexpected to have artifacts). These were localized as way off the probe with the monopolar triangulation which may support your idea of the cost function not behaving well for certain waveforms. In my case this was actually beneficial because I just eliminated them based on their "bad" locations (always the risk of a bug becoming a feature :P ). Super interested in following this discussion though.

samuelgarcia commented 1 week ago

for the moment we could could clip to the border + margin. this would be easy no ? @cwindolf : do you want to add this ?