ValueError for si.extract_waveforms() for many spikeforest datasets

CoderJacques commented 1 year ago

@magland I am trying to download the ground-truth waveforms of spikeforest datasets using the following code:

import spikeforest as sf
import spikeinterface as si
import spikeinterface.preprocessing as spre

all_recordings = sf.load_spikeforest_recordings()

for R in all_recordings:

    print(R.study_set_name, R.study_name, R.recording_name)

    recording = R.get_recording_extractor()
    recording_f = spre.bandpass_filter(recording, freq_min=300, freq_max=6000)

    sorting_true = R.get_sorting_true_extractor()

    unit_ids = sorting_true.get_unit_ids()

    spikes_per_neuron = []

    for i in range(R.num_true_units):
        spike_train = sorting_true.get_unit_spike_train(unit_id=unit_ids[i])
        spikes_per_neuron.append(len(spike_train))

    we_TDC = si.extract_waveforms(recording_f,
                                  sorting_true,
                                  f'{R.study_set_name}_{R.study_name}_{R.recording_name}',
                                  overwrite=True,
                                  max_spikes_per_unit=max(spikes_per_neuron)+1)

For many data sets (e.g. PAIRED_BOYDEN paired_boyden32c 419_1_8) I get a ValueError. For instance, the 419_1_8 dataset yields

ValueError: cannot reshape array of size 93363 into shape (32,28568)

zm711 commented 1 year ago

@CoderJacques

It would likely be more helpful if you could post the whole error trace for at least one of the failing datasets to know where exactly the ValueError is occurring. Could you do that please :)

CoderJacques commented 1 year ago

@zm711 Of course! see below:

PAIRED_BOYDEN paired_boyden32c 419_1_8
Setting 'return_scaled' to False
extract waveforms memmap:  99%|#########9| 487/491 [00:10<00:00, 48.64it/s]
Traceback (most recent call last):
  File "PATH_SF/spikeforest/playground.py", line 26, in <module>
    we_TDC = si.extract_waveforms(recording_f,
  File "PATH_SI/spikeinterface/core/waveform_extractor.py", line 1583, in extract_waveforms
    we.run_extract_waveforms(seed=seed, **job_kwargs)
  File "PATH_SI/spikeinterface/core/waveform_extractor.py", line 1353, in run_extract_waveforms
    wfs_arrays = extract_waveforms_to_buffers(
  File "PATH/spikeinterface/core/waveform_tools.py", line 96, in extract_waveforms_to_buffers
    distribute_waveforms_to_buffers(
  File "PATH_SI/spikeinterface/core/waveform_tools.py", line 278, in distribute_waveforms_to_buffers
    processor.run()
  File "PATH_SI/spikeinterface/core/job_tools.py", line 376, in run
    res = self.func(segment_index, frame_start, frame_stop, worker_ctx)
  File "PATH_SI/spikeinterface/core/waveform_tools.py", line 372, in _waveform_extractor_chunk
    traces = recording.get_traces(
  File "PATH_SI/spikeinterface/core/baserecording.py", line 278, in get_traces
    traces = rs.get_traces(start_frame=start_frame, end_frame=end_frame, channel_indices=channel_indices)
  File "PATH_SI/spikeinterface/preprocessing/filter.py", line 130, in get_traces
    traces_chunk, left_margin, right_margin = get_chunk_with_margin(
  File "PATH_SI/spikeinterface/core/recording_tools.py", line 229, in get_chunk_with_margin
    traces_chunk = rec_segment.get_traces(
  File "PATH_SI/spikeforest/load_extractors/MdaRecordingExtractorV2/MdaRecordingExtractorV2.py", line 66, in get_traces
    recordings = self._diskreadmda.readChunk(i1=0, i2=start_frame, N1=self._diskreadmda.N1(),
  File "PATH_SF/spikeforest/load_extractors/MdaRecordingExtractorV2/MdaRecordingExtractorV2.py", line 178, in readChunk
    return np.reshape(X, (N1, N2), order='F')
  File "<__array_function__ internals>", line 180, in reshape
  File "PATH_NP/numpy/core/fromnumeric.py", line 298, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
  File "PATH_NP/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
ValueError: cannot reshape array of size 93363 into shape (32,28568)

zm711 commented 1 year ago

@CoderJacques,

Looks like the specific error is occurring in spikeforest itself. Probably worth opening an issue directly on the spikeforest repo as well to make sure it can be addressed at that level too.

alejoe91 commented 1 year ago

Thanks @zm711

@magland I wonder if you could use the spikeinterface MdaRecordingExtraxtor instead the V2 one. What is the difference with V2? Can you look into the error?

alejoe91 commented 1 year ago

@CoderJacques I wasn't able to reproduce the issue with PAIRED_BOYDEN paired_boyden32c 419_1_8

Here's my code an my output:

import spikeforest as sf
import spikeinterface as si
import spikeinterface.preprocessing as spre
import os

all_recordings = sf.load_spikeforest_recordings()

recordings = [r for r in all_recordings if r.study_name == "paired_boyden32c" and \
    r.study_set_name == "PAIRED_BOYDEN" and r.recording_name == "419_1_8"]

for R in recordings:

    print(R.study_set_name, R.study_name, R.recording_name)

    recording = R.get_recording_extractor()
    recording_f = spre.bandpass_filter(recording, freq_min=300, freq_max=6000)

    sorting_true = R.get_sorting_true_extractor()

    unit_ids = sorting_true.get_unit_ids()

    spikes_per_neuron = []

    for i in range(R.num_true_units):
        spike_train = sorting_true.get_unit_spike_train(unit_id=unit_ids[i])
        spikes_per_neuron.append(len(spike_train))

    we_TDC = si.extract_waveforms(recording_f,
                                  sorting_true,
                                  f'{R.study_set_name}_{R.study_name}_{R.recording_name}',
                                  overwrite=True,
                                  max_spikes_per_unit=max(spikes_per_neuron)+1)

Output:

PAIRED_BOYDEN paired_boyden32c 419_1_8
Setting 'return_scaled' to False
extract waveforms memmap: 100%|###################################################################################################################################| 491/491 [00:08<00:00, 57.69it/s]

I'm using:

kachery-cloud                 0.4.3
spikeinterface                0.98.2
spikeforest                   0.12.7

magland commented 1 year ago

I also wasn't able to reproduce the problem. It worked for me. @CoderJacques could you please report your versions of kachery-cloud, spikeinterface, and spikeforest on your system?

CoderJacques commented 1 year ago

I am using the same versions for kachery-cloud (0.4.3), spikeinterface (0.98.2) & spikeforest (0.12.7) on a MacBook Pro 2023. However, if I run the code externally on Google Colab using the same package versions, extracting the waveforms is not a problem. Thus, I'll just run it externally.

Thanks @alejoe91 @zm711 & @magland

zm711 commented 1 year ago

@magland and @alejoe91

Just for completeness I just tested on a Windows computer and on a 2021 MacBook Pro and was able to get this to work using @alejoe91 script.

SpikeInterface / spikeinterface

ValueError for si.extract_waveforms() for many spikeforest datasets #2094