NeuralEnsemble / elephant

Elephant is the Electrophysiology Analysis Toolkit
http://www.python-elephant.org
BSD 3-Clause "New" or "Revised" License
201 stars 92 forks source link

[Bug] optimal_kernel_bandwidth yields extremely large kernel widths #461

Open essink opened 2 years ago

essink commented 2 years ago

Describe the bug The estimation of the instantaneous_rate with kernel="auto" of a very long spiketrain (~650 s) yielded a very smooth and flat rate. It was tracked down to the estimation of the optimal_kernel_bandwidth (the width of the Gaussian kernel, fixed for the whole duration), which yielded a bandwidth of ~22 s for spiketrain with a rather low firing rate.

After some investigation, it was noticed that the optimal bandwidth strongly depends on the duration of the spiketrain. One reason for this might be the hard-coded number of bins (1000) here:

https://github.com/NeuralEnsemble/elephant/blob/67dd3f3ef86161c0e70faf6e50690bdfd591978c/elephant/statistics.py#L1576-L1597

Although, the option to give an array of times on which the optimization of the kernel is performed exists, it is not very accessible. (See original publication Shimazaki, H. & Shinomoto, S. Kernel bandwidth optimization in spike rate estimation. J Comput Neurosci 29, 171–182 (2010). for details)

Further notice: The implemented optimal_kernel_bandwidth estimates one fixed kernel bandwidth for the given spiketrain and is not adaptive over time! The originating code https://github.com/cooperlab/AdaptiveKDE also implements the adaptive case and could be included into elephant.

To Reproduce

  1. Get spiketimes
  2. Call elephant.statistics.optimal_kernel_bandwidth(spiketimes)

A minimal example is provided in the attached files: issue_optimal_kernel_bandwidth.zip

Expected behavior The order of magnitude should be reasonable and if this is exceeded a warning should be raise.

Environment

Moritz-Alexander-Kern commented 2 years ago

Thank you for reporting this and your effort to contribute to elephant.

I have extended your minimal example to illustrate the issue in connection with elephant.statistics.instantaneous_rate.

import pickle
import quantities as pq
import elephant
import matplotlib.pyplot as plt

# Load example spiketrains
high_firing_rate_st, low_firing_rate_st = pickle.load(
    open('spiketrain_examples.pkl', 'rb'))
# firing rates
print(high_firing_rate_st.annotations)
print(low_firing_rate_st.annotations)
{'mean_firing_rate': array(22.00288951) * 1/s}
{'mean_firing_rate': array(3.85978253) * 1/s}
# Look at kernel bandwidth estimation

# spiketrain = high_firing_rate_st
spiketrain = low_firing_rate_st
spiketimes = spiketrain.times.rescale(pq.s).magnitude

# Choose example spiketrain
kernel_bandwidth = elephant.statistics.optimal_kernel_bandwidth(spiketimes)
print(f'The optimal kernel bandwidth for the full length spiketrain was '
      f'estimated to be {kernel_bandwidth["optw"]} s')
The optimal kernel bandwidth for the full length spiketrain was estimated to be 22.401845783799754 s
# Time slice the spiketrain to a 5 s duration
time_sliced_st = spiketrain.time_slice(t_start=1*pq.s, t_stop=6*pq.s)
spiketimes_sliced = time_sliced_st.times.rescale(pq.s).magnitude

kernel_bandwidth_sliced = elephant.statistics.optimal_kernel_bandwidth(
    spiketimes_sliced)
print(f'The optimal kernel bandwidth for the full length spiketrain was '
      f'estimated to be {kernel_bandwidth_sliced["optw"]} s')
The optimal kernel bandwidth for the full length spiketrain was estimated to be 0.5003558368563431 s
# Look at instantaneous rate

def plot_rate(inst_rate):
    plt.plot(inst_rate.times.simplified.magnitude,
             inst_rate.magnitude, color='orange',
             label='instantaneous rate')
    plt.legend()
    plt.title(f"{inst_rate.annotations['kernel']['type']}: "
              f"sigma={inst_rate.annotations['kernel']['sigma']}, "
              f"sampling_period={int(sampling_period.item())} ms")
    fig = plt.gcf()
    fig.set_size_inches(18.5, 10.5)
    plt.show()

sampling_period = 10 * pq.ms
# use kernel "auto"
rate_auto = elephant.statistics.instantaneous_rate(
    spiketrain,
    sampling_period=sampling_period,
    kernel="auto")

# plot result
plot_rate(rate_auto)

Figure 1: Figure_1

# calculate rate using kernel bandwidth based on time slice
kernel = elephant.kernels.GaussianKernel(
    sigma=kernel_bandwidth_sliced["optw"] * pq.s)

rate_slice = elephant.statistics.instantaneous_rate(
    spiketrain,
    sampling_period=sampling_period,
    kernel=kernel)

# plot result
plot_rate(rate_slice)

Figure 2: Figure_2

The two figures above show the difference for large and small kernel bandwidths. If I understand correctly Figure 2 is the desired result for this case, Figure 1 shows a rate estimate, which is too smooth. (bandwidth too large)

possible solutions

I guess further steps should be discussed in connection to PR #462, thanks again for contributing this.

essink commented 2 years ago

Hi Moritz,

thanks for having a look at it so quickly! Just to clarify a few points:

  1. What an expected or reasonable bandwidth should be, is a very valid question and I don't have a proper answer to that.
  2. I would consider a 500 ms kernel bandwidth still as quite large.
  3. The PR #462 does not solve the issue yet! It was just created while trying to solve it. (The hard-coded number of bins are definitely not ideal, but seem not to cause the main problem here)
  4. This method of estimating the instantaneous rate was, traditionally, applied to the spiketrains of sets of trials, which were short in duration. It might be a limitation of the method itself.

Having said that, I agree that we can move the discussion to PR #462 .