SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
455 stars 175 forks source link

Contamination ratio #1973

Open llobetv opened 9 months ago

llobetv commented 9 months ago

Just a comment about compute_isi_violations from Hill (qualitymetrics/misc_metrics.py l. 240)

It is write that : "You can interpret an ISI violations ratio value of 0.5 as meaning that contaminating spikes are occurring at roughly half the rate of "true" spikes for that unit." However Hill et al. converge to isi_contamination close to 0.5 when true contamination converge to 1: image

This equation works well for contamination estimation when contamination is low.

code to check:

import numpy as np import spikeinterface.full as si

duration = 2200 # in s neuron = np.arange(0, duration, 100e-3) # neuron that spike once every 100ms during 2200 s => 22 000 spikes noise = np.random.uniform(0, duration, 220000) # create 220 000 spike with random timing

final = np.sort(np.concatenate([noise, neuron])) # create noisy neuron calc_isi_violation = si.misc_metrics.isi_violations([final], duration)[0] true_contamination = len(noise)/len(final)

print("Calculate isi_violation is : {}".format(calc_isi_violation)) print("True contamination is : {}".format(true_contamination))

@DradeAW

zm711 commented 9 months ago

Nick Steinmetz actually put in the correction based on @llobetv/@DradeAW paper 3 days ago into sortingQuality which is the basis for the SI implementation of the function si.isi_violations(). So now sortingQuality runs the equivalent of si.compute_refrac_period_violations() under the hood rather than a modification from the Hill et al paper. Not sure what the best solution is (ie merge the functions vs change documentation to be more explicit).

https://github.com/cortex-lab/sortingQuality/blob/70c8659adc60484434be828d617e16eb83e94cca/core/ISIViolations.m#L21-L46

DradeAW commented 9 months ago

One (little) detail also: You are looking at refractory period violations in the ISI, this leads to (very small) difference from the actual number.

If there are 3 spikes very close to one another, looking at the ISI will produce 2 violations, whereas there are in fact 3 (1 with 2, 2 with 3 and 1 with 3). This is only a problem for very high contamination and the difference is very small, so it may be a good idea to compute on the ISI to save time in the computation. Nevertheless I wanted to point it out to make it clear :)

zm711 commented 9 months ago

Good point :)

h-mayorquin commented 9 months ago

Is there an actionable to-do so we could close this issue? Should the documentation be modified somewhere to make this point explicit?

zm711 commented 9 months ago

@h-mayorquin

I think the actionable item would be 1 of 2 things:

1) change documentation (rtd and docstring) to state the function is still based on the original sortingQuality modification of the Hill et al conception of violations

OR

2) Change to the current sortingQuality implementation which is a modification of @llobetv and @DradeAW using ISI (as @DradeAW pointed out) so that we keep things in sync and modify the docs to mention that although originally based on Hill we have modified to stay in sync with the source repo of the function.

But I'm not sure which one would be preferred here @alejoe91?