NeuralEnsemble / elephant

Elephant is the Electrophysiology Analysis Toolkit
http://www.python-elephant.org
BSD 3-Clause "New" or "Revised" License
202 stars 92 forks source link

[Feature] Add test for automatic kernel selection to test against reference implementation #247

Open mdenker opened 5 years ago

mdenker commented 5 years ago

Feature To prevent bugs such as #245, there should be a unit test of the instantaneous rate estimate against reference data. One option is to compare against data obtained from the online tool.

mdenker commented 3 years ago

This is the website to compute a reference implementation: https://www.neuralengine.org/res/kernel.html

Moritz-Alexander-Kern commented 2 years ago

Implementations

There are 3 available implementations of this method on https://www.neuralengine.org/res/kernel.html.

Tests

Compare the results obtained with elephant against the other implementations.

Installation

Code

The following snippet was used to produce the results:

import elephant
import neo
from elephant.datasets import download_datasets
import adaptivekde

# Download data
repo_path = 'tutorials/tutorial_unitary_event_analysis/data/dataset-1.nix'
filepath = download_datasets(repo_path)

# load data
io = neo.io.NixIO(filepath, 'ro')
block = io.read_block()

for segment_no in range(0,5):
    # segment_no = 3
    train_no = 0
    print(f"Segment number: {segment_no}")
    print(f"Spiketimes{block.segments[segment_no].spiketrains[train_no].magnitude}")

    bandwidth = elephant.statistics.optimal_kernel_bandwidth(block.segments[segment_no].spiketrains[train_no].magnitude)
    print("Results:")
    print(f"Optimal kernel bandwidth calculated with Elephant    {bandwidth['optw']}")

    bandwidth2 = adaptivekde.sskernel(block.segments[segment_no].spiketrains[train_no].magnitude)

    print(f"Optimal kernel bandwidth calculated with adaptivekde {bandwidth2[2]}")

Results

(added results from HTML5 based online tool manually)

Segment number: 0
Spiketimes[  26.   48.   78.  144.  178.  216.  258.  278.  315.  347.  407.  421.
  447.  467.  490.  539.  559.  589.  643.  672.  702.  738.  771.  797.
  820.  841.  857.  877.  913.  940.  976. 1004. 1026. 1045. 1063. 1086.
 1090. 1120. 1147. 1164. 1190. 1213. 1233. 1250. 1274. 1298. 1337. 1407.
 1424. 1430. 1449. 1477. 1503. 1535. 1558. 1588. 1629. 1630. 1654. 1671.
 1697. 1712. 1747. 1858. 1873.]
Results:
Optimal kernel bandwidth calculated with Elephant    421.4054380886631
Optimal kernel bandwidth calculated with adaptivekde 421.4054380886631
Optimal kernel bandwidth calculated with HTML5 tool (Ver. 0.4) 307.833
Segment number: 1
Spiketimes[   4.   45.   85.  123.  156.  185.  226.  274.  296.  323.  348.  371.
  404.  471.  499.  530.  560.  581.  601.  626.  654.  700.  727.  762.
  792.  805.  827.  828.  847.  869.  899.  920.  933.  972. 1001. 1023.
 1031. 1051. 1086. 1152. 1172. 1176. 1199. 1223. 1245. 1273. 1304. 1379.
 1400. 1401. 1417. 1424. 1444. 1449. 1483. 1546. 1564. 1583. 1588. 1615.
 1629. 1660. 1683. 1706. 1732. 1774. 1816. 1834. 1849. 1884. 1905. 2030.]
Results:
Optimal kernel bandwidth calculated with Elephant    433.8207975780849
Optimal kernel bandwidth calculated with adaptivekde 433.8207975780849
Optimal kernel bandwidth calculated with HTML5 tool (Ver. 0.4) 337.667
Segment number: 2
Spiketimes[  39.  170.  256.  299.  362.  487.  533.  535.  564.  625.  656.  686.
  742.  785.  791.  835.  894. 1007. 1034. 1050. 1079. 1103. 1138. 1173.
 1210. 1292. 1312. 1341. 1363. 1418. 1446. 1483. 1526. 1555. 1585. 1658.
 1707. 1739. 1756. 1812. 1849. 1875. 1920. 2040. 2081.]
Results:
Optimal kernel bandwidth calculated with Elephant    495.3696970934899
Optimal kernel bandwidth calculated with adaptivekde 495.3696970934899
Optimal kernel bandwidth calculated with HTML5 tool (Ver. 0.4) 408.400
Segment number: 3
Spiketimes[  25.   68.  170.  201.  247.  286.  325.  346.  405.  448.  491.  511.
  542.  588.  704.  728.  754.  782.  812.  845.  867.  913.  942. 1011.
 1034. 1063. 1092. 1119. 1159. 1310. 1354. 1376. 1530. 1579. 1590. 1612.
 1641. 1749. 1797. 1819. 1831. 1872. 2060. 2072. 2078.]
Results:
Optimal kernel bandwidth calculated with Elephant    513.3814318298448
Optimal kernel bandwidth calculated with adaptivekde 513.3745574621862
Optimal kernel bandwidth calculated with HTML5 tool (Ver. 0.4) 410.600
Segment number: 4
Spiketimes[  19.   57.   84.  116.  157.  192.  214.  245.  280.  415.  428.  450.
  488.  516.  567.  592.  612.  628.  646.  648.  692.  720.  746.  761.
  791.  828.  865.  883.  905.  915.  939.  969. 1023. 1046. 1085. 1135.
 1175. 1187. 1206. 1224. 1246. 1267. 1292. 1331. 1380. 1445. 1468. 1511.
 1536. 1646. 1674. 1704. 1724. 1839. 2047.]
Results:
Optimal kernel bandwidth calculated with Elephant    432.43451518531515
Optimal kernel bandwidth calculated with adaptivekde 432.43451518531515
Optimal kernel bandwidth calculated with HTML5 tool (Ver. 0.4) 338.000

The results obtained with the python implementation "adaptivekde" are very closely in line with elephants implementation.

However, the results obtained with the online tool differ significantly. This might be due to an optimization used by the online tool which is "accelarating computation by excluding >5 std samples. " (see: https://www.neuralengine.org/res/kernel.html)

How to proceed ?

I don't have any strong opinions here. Any suggestions?

mdenker commented 2 years ago

Thanks for working on this. For Python and Matlab, I suggest the same procedure as done for similar validations before, i.e., include a script on elephant-data to calculate the results using these tools, save them in elephant-data, and in the elephant unit tests load the results from there and compare. I don't think we should increase our dependency tree by directly installing the adaptivekde.

For the web interface version, I suggest to first further investigate the differences. If it turns out that the implementation itself differs too much, there is likely no need for a third validation. Conversely, the doc string should point out the incompatibility.

mdenker commented 2 years ago

For the test, one could also think about creating stochastic data (e.g., using Poisson with varying rate) -- this would make the test independent of the Unitary Events dataset.