NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
325 stars 248 forks source link

read_spikegadgets broken for neuropixel recordings #1517

Closed RobertoDF closed 1 month ago

RobertoDF commented 3 months ago

Hi, The read_spikegadgets from spikeinterface was working with neuropixel probes with this committ 83a84b2 but now it is broken.

 File "C:\Users\SciencePerson\miniforge3\envs\spikesorting\lib\site-packages\IPython\core\interactiveshell.py", line 3550, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-3052bc77e239>", line 1, in <module>
    raw_rec = read_spikegadgets(r"X:\13\ephys\20231213_155419.rec\20231213_155419.rec" )
  File "C:\Users\SciencePerson\miniforge3\envs\spikesorting\lib\site-packages\spikeinterface\extractors\neoextractors\spikegadgets.py", line 45, in __init__
    NeoBaseRecordingExtractor.__init__(
  File "C:\Users\SciencePerson\miniforge3\envs\spikesorting\lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py", line 188, in __init__
    _NeoBaseExtractor.__init__(self, block_index, **neo_kwargs)
  File "C:\Users\SciencePerson\miniforge3\envs\spikesorting\lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py", line 27, in __init__
    self.neo_reader = self.get_neo_io_reader(self.NeoRawIOClass, **neo_kwargs)
  File "C:\Users\SciencePerson\miniforge3\envs\spikesorting\lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py", line 66, in get_neo_io_reader
    neo_reader.parse_header()
  File "C:\Users\SciencePerson\miniforge3\envs\spikesorting\lib\site-packages\neo\rawio\baserawio.py", line 189, in parse_header
    self._parse_header()
  File "C:\Users\SciencePerson\miniforge3\envs\spikesorting\lib\site-packages\neo\rawio\spikegadgetsrawio.py", line 227, in _parse_header
    signal_channels = signal_channels[keep]
IndexError: list index out of range

Environment:

zm711 commented 3 months ago

Hey @RobertoDF, thanks for the report. I don't use spikegadgets but one quick question:

why is your path xx.rec\xx.rec. The folder has the exact same name as the file? That seems a bit annoying.

We recently made the change because the channel IDs were not accurate (according to a user). So I may need your dataset in order to troubleshoot this (since I don't know the format by itself).

spikegadgets passed our tests so if you could make a small file (<10MB) with your machine and send it to us then we can add it to our tests so we never break your workflow again :)

zm711 commented 3 months ago

Hey @RobertoDF,

based on all our test data it looks like the file shouldn't be nested like that xx.rec/xx.rec. If you named your folder that way then maybe it is messing with our parsing. could you clarify that for me?

RobertoDF commented 3 months ago

Just tested it and i get the same error:

IndexError                                Traceback (most recent call last)
Cell In[3], line 2
      1 path_recording_folder = Path(r"X:\13\ephys\20231213_155419")
----> 2 raw_rec = load_rec_file_with_correct_times(path_recording_folder)

File ~\Documents\GitHub\spikesorting\Utils\Utils.py:489, in load_rec_file_with_correct_times(path_recording_folder)
    486 path_recording, rec_file_name = check_single_rec_file(path_recording_folder)
    487 timestamps = get_timestamps_from_rec(path_recording_folder, path_recording)
--> 489 raw_rec = read_spikegadgets(path_recording)
    491 fs = raw_rec.get_sampling_frequency()
    493 correct_times = timestamps / fs

File ~\miniforge3\envs\spikesorting\lib\site-packages\spikeinterface\extractors\neoextractors\spikegadgets.py:45, in SpikeGadgetsRecordingExtractor.__init__(self, file_path, stream_id, stream_name, all_annotations, use_names_as_ids)
     36 def __init__(
     37     self,
     38     file_path,
   (...)
     42     use_names_as_ids: bool = False,
     43 ):
     44     neo_kwargs = self.map_to_neo_kwargs(file_path)
---> 45     NeoBaseRecordingExtractor.__init__(
     46         self,
     47         stream_id=stream_id,
     48         stream_name=stream_name,
     49         all_annotations=all_annotations,
     50         use_names_as_ids=use_names_as_ids,
     51         **neo_kwargs,
     52     )
     53     self._kwargs.update(dict(file_path=str(Path(file_path).absolute()), stream_id=stream_id))
     55     probegroup = None  # TODO remove once probeinterface is updated to 0.2.22 in the pyproject.toml

File ~\miniforge3\envs\spikesorting\lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py:188, in NeoBaseRecordingExtractor.__init__(self, stream_id, stream_name, block_index, all_annotations, use_names_as_ids, **neo_kwargs)
    158 def __init__(
    159     self,
    160     stream_id: Optional[str] = None,
   (...)
    165     **neo_kwargs: Dict[str, Any],
    166 ) -> None:
    167     """
    168     Initialize a NeoBaseRecordingExtractor instance.
    169 
   (...)
    185 
    186     """
--> 188     _NeoBaseExtractor.__init__(self, block_index, **neo_kwargs)
    190     kwargs = dict(all_annotations=all_annotations)
    191     if block_index is not None:

File ~\miniforge3\envs\spikesorting\lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py:27, in _NeoBaseExtractor.__init__(self, block_index, **neo_kwargs)
     23 def __init__(self, block_index, **neo_kwargs):
     24 
     25     # Avoids double initiation of the neo reader if it was already done in the __init__ of the child class
     26     if not hasattr(self, "neo_reader"):
---> 27         self.neo_reader = self.get_neo_io_reader(self.NeoRawIOClass, **neo_kwargs)
     29     if self.neo_reader.block_count() > 1 and block_index is None:
     30         raise Exception(
     31             "This dataset is multi-block. Spikeinterface can load one block at a time. "
     32             "Use 'block_index' to select the block to be loaded."
     33         )

File ~\miniforge3\envs\spikesorting\lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py:66, in _NeoBaseExtractor.get_neo_io_reader(cls, raw_class, **neo_kwargs)
     64 neoIOclass = getattr(rawio_module, raw_class)
     65 neo_reader = neoIOclass(**neo_kwargs)
---> 66 neo_reader.parse_header()
     68 return neo_reader

File ~\miniforge3\envs\spikesorting\lib\site-packages\neo\rawio\baserawio.py:189, in BaseRawIO.parse_header(self)
    176 """
    177 Parses the header of the file(s) to allow for faster computations
    178 for all other functions
    179 
    180 """
    181 # this must create
    182 # self.header['nb_block']
    183 # self.header['nb_segment']
   (...)
    186 # self.header['spike_channels']
    187 # self.header['event_channels']
--> 189 self._parse_header()
    190 self._check_stream_signal_channel_characteristics()
    191 self.is_header_parsed = True

File ~\miniforge3\envs\spikesorting\lib\site-packages\neo\rawio\spikegadgetsrawio.py:227, in SpikeGadgetsRawIO._parse_header(self)
    224     units = ""
    226 for schan in trode:
--> 227     chan_id = str(channel_ids[chan_ind])
    228     name = "hwChan" + chan_id
    230     offset = 0.0

IndexError: list index out of range

I ll try to send a smaller .rec file soon. Thanks for quick response!

RobertoDF commented 3 months ago

I cant produce a small file for now (technical problem). Do you know if the testing file is with one or two probes? I am using two probes. Can you maybe share the testing file? Thanks!!!

Btw before being totally broken, the function (read_spikegadgets) was returning a probe with gain scaling always set to 1. I did not pinpoint which commit was the cause of this though.

This is all relative to .rec files with 2 neuropixels: https://github.com/NeuralEnsemble/python-neo/commit/83a84b2601b3aa10685d4ada41a06a49c421540c : read_spikegadget works unknown committ: read_spikegadgets returns wrong gain scaling latest committ: read_spikegadgets broken

zm711 commented 3 months ago

I don't know if it the test files are multi-probe for sure. I didn't help with the gain, but the recent PR referred to 4 shanks I believe. So by two probe you mean 8 shanks or 2 shanks total?

Our test files are all located here:

https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/spikegadgets

You can also share the full sized file here so that I can work on a fix. But we won't merge the fix until we have the test file so that might allow me to at least create a branch you can install from.

RobertoDF commented 3 months ago

all files i have access to right now are huge. smaller I found was 8 GB. The file I can´t load is with two NPX 1.0 probes. one shank each. Ill check the test file now.

RobertoDF commented 3 months ago

oh! I can´t open any of the test files with read_spikegadgets, it just hangs forever. I tried also to update neo to latest version.

zm711 commented 3 months ago

8GB I can work with. (Not ideal), but if this is urgent for your analysis I can see if it is a quick fix.

The gain change was here: 7d0e575.

The idea being that if the header doesn't contain explicit gain information we must set it to 1. Do you know of another way to get the gain information?

oh! I can´t open any of the test files with read_spikegadgets, it just hangs forever. I tried also to update neo to latest version.

g-node can be a bit tricky. Do you want to make sure you downloaded the actual file and not a stub of the file :)

RobertoDF commented 3 months ago

ahah! thanks, i try again

RobertoDF commented 3 months ago

The gain information was correctly identified here https://github.com/NeuralEnsemble/python-neo/commit/83a84b2601b3aa10685d4ada41a06a49c421540c . The test recording does include 2 probes and is loaded smoothly so I am really confused now. I get the same error with this smaller file (https://www.dropbox.com/scl/fi/grt4dn52zhaolnfdu8bvu/20231210_160309.rec?rlkey=nc71uycmp09f803223hlo3num&dl=0)

zm711 commented 3 months ago

Weird. We haven't changed the gain since then (that I know of).... What is your gain saying?

You said you have neuropixels. Since I don't know spikegadgets does that come with a headstage or is part of the neuropixel system? The recent change was to account for the way intan-headstages would need to be counted. So we basically fixed the number of channels based on the standard intan headstages. Maybe because neuropixels can record from different channel numbers this means we need to have flexibility instead. Since we didn't have the fixed count before it might have read yours, but now it can't. I'll look at your file soon (I'm already running some analyses on my computer and I'm resource limited on this station :( )

RobertoDF commented 3 months ago

Sure, thanks for looking into this!! My gain with the pinned neo version is scaled at 0.018311. With a scaling of 1, all channels are detected as noise by spikeinterface (detect_bad_channels func). That´s how I realized something has changed. Than I updated every spikeinterface related package and the func was broken.

Yes, there is a special neuropixel headstage

zm711 commented 3 months ago

Then if that is the case I think that our test data must just use the intan headstage. So if (when technically difficulties allow) you can send us the small file that will definitely help us make sure this doesn't break in the future. My guess is that the neuropixel headstage provides a different gain header that I will need find, and hopefully spikegadgets will provide some header info that will allow me to distinguish the different headstages so that we can check for that and not use fixed channel numbers.

RobertoDF commented 3 months ago

The test file "SpikeGadgets_test_data_2xNpix1.0_20240318_173658.rec" comes from the same recording system, a colleague of mine here sent it. That´s why I am really confused that it can be loaded smoothly and some other files instead don´t work. I checked just before that it includes 2 NPX probes.

zm711 commented 3 months ago

Then my guess would be:

1) dumb luck. Since someone with intan headstages edited the last one. Your colleague's test file had exactly the same number of channels as an intan-based system would have

2) there's been a software update since your colleague recorded that test (or vice versa, they made the test with new and you're dealing with old) and so we need to account for version issues