SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
491 stars 188 forks source link

zscore fails when using whitenoise as an input. #1972

Open h-mayorquin opened 12 months ago

h-mayorquin commented 12 months ago

Related to this #1971:

Bug:

from spikeinterface.preprocessing import normalize_by_quantile, scale, center, zscore
from spikeinterface.core import generate_recording

seed = 1
rec = generate_recording(seed=seed, mode="lazy")
rec_int = scale(rec, dtype="int16", gain=100)

zscore_recording = zscore(rec_int, dtype="int16", int_scale=256, mode="mean+std", seed=seed)
traces = zscore_recording.get_traces(segment_index=0)
trace_mean = np.mean(traces, axis=0)
trace_std = np.std(traces, axis=0)
assert np.all(np.abs(trace_mean) < 1)
assert np.all(np.abs(trace_std - 256) < 1)

For most seeds with lazy this fails.

@samuelgarcia you are the one that implemented this range thing for z-scores, any idea on why this might be?

h-mayorquin commented 12 months ago

This functionality was implemented here: https://github.com/SpikeInterface/spikeinterface/pull/1437

alejoe91 commented 12 months ago

Interesting! I'll also take a look! Thanks for tracking this down @h-mayorquin

h-mayorquin commented 12 months ago

I thought more about this this morning. It is just a statistical problem with the test. This is not a bug. The thing is that if you scale your data by a standadrd deviation you increase the standard error:

$SE = \frac{\sigma}{\sqrt{n}}$

Then you need a larger n. If you calculate the standard error of the mean for the chunks available you get that the standard error is not small enough for the test above to be sound.

Plus, casting to int is not a linear function (it behaves like floor for positives and like ceiling for negatives) which also increases the standard deviation of the samples and will make the standard error larger in practice.

Here, I estimate taking the mean 1_000 times with 100_000 samples. You can see that most of the estimates are farther from the true (which is zero) than 1 and therefore will fail on the test above.

estimates = (256 * np.random.randn(int(5e4), 1000).astype("int16")).mean(axis=0)
plt.hist(estimates, bins=100)
plt.xlabel("Estimated mean from the process")
plt.ylabel("Frequency")

image

h-mayorquin commented 12 months ago

I would like for us to shutdown this test and slowly deprecate the int range functionality in ZScoreRecording so we can confine this complexity to kilosort only. Probably Sam tested this a lot there when he was implementing kilosort and as this is is an ad-hoc processing step for them we don't neeed to test it in the general.

Within the kilosort pipeline or anywhere else if the need for an operation like this arises the operation can be performed by composing the casting and scaling operations already available in the pre-processing module. No need for flags.

Simliar to previous discussions: https://github.com/SpikeInterface/spikeinterface/issues/1815#issuecomment-1708523970 https://github.com/SpikeInterface/spikeinterface/issues/914