[Error] List index out of range error during CAD (cell assembly detection) during agglomeration of neuron(s)

Idavr commented 7 months ago

Describe the bug During some analysis runs using the cell assembly detection (CAD) method on recording segments with 758 neurons I get this particular error:

Initializing the dictionaries...
actual significance_level 3.4854985831448264e-08
Testing on pairs...

Testing on higher order assemblies...

Traceback (most recent call last):
  File "/Users/idavalik/workspace/projects/elephant/elephant/assembly_detection.py", line 140, in <module>
    patterns = cell_assembly_detection(binned_spikes, max_lag=2, min_occurrences=3, subgroup_pruning=True, verbose=True)
  File "/opt/anaconda3/envs/elephant/lib/python3.8/site-packages/elephant/cell_assembly_detection.py", line 363, in cell_assembly_detection
    ensemble=assembly[w1],
IndexError: list index out of range

This is unrelated to the length of the analysed segment as it has happened with a couple of seconds to (the latest) 118 seconds. As this happens quite late in the analysis, this is quite unfortunate for longer segments as I end up loosing hours of analysis due to this, and I do not know what the error indicates, apart from the w1 + 1 increment in the code seems to sometimes count too high for the agglomeration. I am not sure how to rectify this.

To Reproduce

Running python code from within Visual Studio Code 1.87.2.

bins = 10

binned_spikes = BinnedSpikeTrain(spiketrains, t_start=epoch1_start*pq.s,bin_size=bins*pq.ms)
binned_spikes.rescale('ms')

patterns = cell_assembly_detection(binned_spikes, max_lag=2, min_occurrences=3, subgroup_pruning=True, verbose=True)

Environment

OS (e.g., Linux): MacOS Sonoma 14.4
How you installed elephant (conda, pip, source): conda
Python version: 3.8.18
neo python package version: 0.13.0
elephant python package version: 1.0.0

Any input is appreciated.

Moritz-Alexander-Kern commented 7 months ago

Hey @Idavr , thanks for reporting this.

Unfortunately I fail to reproduce this, I tried:

from elephant.conversion import BinnedSpikeTrain
from elephant.cell_assembly_detection import cell_assembly_detection
from elephant.spike_train_generation import StationaryPoissonProcess
import numpy as np
import quantities as pq

np.random.seed(1)

SPP = StationaryPoissonProcess(50 * pq.Hz, t_start=0 * pq.ms, t_stop=10000 * pq.ms)
spiketrains = SPP.generate_n_spiketrains(10)

bins = 10

binned_spikes = BinnedSpikeTrain(spiketrains, bin_size=bins * pq.ms)
binned_spikes.rescale("ms")

patterns = cell_assembly_detection(
    binned_spikes, max_lag=2, min_occurrences=3, subgroup_pruning=True, verbose=True
)

The error message suggests that assembly[w1] is out of range in line 363. line 363, in cell_assembly_detection ensemble=assembly[w1], IndexError: list index out of range

Looking into the code in line 330: https://github.com/NeuralEnsemble/elephant/blob/fe5053a52a3aaa5c86d9a9c61f72de14b364df1a/elephant/cell_assembly_detection.py#L330

Here assembly[w1]['neurons'] seems to not raise an error. I.e. no index error.

Confusingly, at a later point in the code, w1 has become out of range for the assembly list when accessing the index.

This could be due to changes in the list's length between line 330 and 363, but I struggle to see where this could occur.

Without being able to reproduce this error, it will be very difficult to trace this down. This seems to only occur under certain conditions which are related to the data you're analyzing.

Is there any way, this can be reproduced without access to the recording data? i.e. a code example, similar to the one I provided?

Idavr commented 7 months ago

Hi @Moritz-Alexander-Kern

Yes, how and why the length of the list changes is what I also had trouble understanding. I have not found a systematic reason yet for why this would occur, and I am uncertain whether I can share the data I am working with or not. However, maybe more information might help in reproducing the error.

I see you tried reproducing it with 10 units firing with the same frequency (50Hz), but I work with 700+ units in the same recording, total length 4500 seconds and as the probe spans different regions you have a mix of unit types with different firing frequencies and patterns. Some are regular firing with low variation, while others are bursty and are active sporadically through the recording. The firing frequency ranges greatly among them.

Apart from the code you provided I do not do anything else. I load a matrix of units with their spike times, isolate spikes within the time period of interest and then bin and analyse them as you have exemplified.

Are there any other pieces of information that might help you?

Idavr commented 5 months ago

Hi again @Moritz-Alexander-Kern !

I have come upon the same issue with a different recording. However, this recording is publicly available so maybe you can give troubleshooting another go with the same data?

The data file is found here: https://dandiarchive.org/dandiset/000473/0.230417.1502/files?location=sub-216301&page=1

This time I have tried to run a longer section (which in many other recordings were no problem), namely a start time of anywhere from 0 to 10 seconds to a stop time of 2575 seconds.

Hope to hear from you soon.

Moritz-Alexander-Kern commented 4 months ago

Hey Idavr ,

thanks for providing this example dataset.

I have used the following code example to test different scenarios:

from neo import Segment, SpikeTrain
import quantities as pq
from pynwb import NWBHDF5IO
from elephant.conversion import BinnedSpikeTrain
from elephant.cell_assembly_detection import cell_assembly_detection

nwb_file_path = "sub-216301_ses-216301-20200521-probe0_ecephys+ogen.nwb"
io = NWBHDF5IO(nwb_file_path, mode="r")
nwbfile = io.read()

units = nwbfile.units

t_start = 0
t_stop = 10.0  # 2575
segment = Segment()
for unit in units["spike_times"]:
    segment.spiketrains.append(
        SpikeTrain(
            unit[(t_start < unit) & (unit < t_stop)],
            units="s",
            t_start=t_start * pq.s,
            t_stop=t_stop * pq.s,
        )
    )
spiketrains = segment.spiketrains[:]  # [0:2]
io.close()

bins = 10

binned_spikes = BinnedSpikeTrain(spiketrains, bin_size=bins * pq.ms)
binned_spikes.rescale("ms")

patterns = cell_assembly_detection(
    binned_spikes, max_lag=2, min_occurrences=3, subgroup_pruning=True, verbose=True
)

If I try to run this with all 542 neurons and the full length of the recording (2575 seconds), I simply run out of memory. I've tried other parameters, such as selecting a shorter time frame (from t_start to t_stop of 10 to 500 seconds) or using fewer neurons, but I still can't reproduce the index error.

Could you clarify if you are running the full duration with all neurons? If so, how are you managing to avoid running out of memory?

Moritz-Alexander-Kern commented 2 months ago

We have not heard from you recently and we could not reproduce the errors that you are reporting. Closing this now, feel free to reopen this at any time.

NeuralEnsemble / elephant

[Error] List index out of range error during CAD (cell assembly detection) during agglomeration of neuron(s) #627