catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
45 stars 22 forks source link

[Bug]: `NeuralynxRecordingInterface` fails on Neuralynx data with multiple streams #927

Open KristijanArmeni opened 1 week ago

KristijanArmeni commented 1 week ago

What happened?

When trying to convert a Neuralynx dataset following the snippet from the conversion gallery, the initialization of NeuralynxRecordingInterface() fails if the dataset contains channels (.ncs files) from multiple streams, that is groups of recording channels that have different sampling rates.

To my understanding, NeuralynxRecordingInterface relies on the _NeoBaseExtractor.get_neo_io_reader(**neo_kwargs) from the spikeinterface package which propagates **neo_kwargs to the dedicated neo reader. And neo readers allow the exclude_filename argument (see NeuralynxRawIO) where the user can choose to exclude offending recording channels. However, NeuralynxRecordingInterface() does not expect the exclude_filename parameter.

Currently, NeuralynxRecordingInterface allows the stream_name parameter, but that will fail too since it goes via the above .get_neo_io_reader() method of the spikeinterface package.

Possible fix

Allow NeuralynxRecordingInterface to accept exclude_filename keyword argument.

The current workaround is to simply do conversion separately for groups of channels with different sampling rates (i.e. put them in separate folders). If i understand correctly, would lead to several *.nwb files (one for each stream). I don't know if that's a desired behavior or if the NWB standard allows different sampling rates per dataset.

Steps to Reproduce

For example, in MNE-Python testing data we have a small (< 1MB) Neuralynx testing dataset with one channel sampled at 2kHz (LAHC1.ncs) and another one at 32kHz (LAHCu1.ncs).

If I place those two channels into the same folder and run this code I get the traceback pasted below:

from neuroconv.datainterfaces import NeuralynxRecordingInterface
folder_path = "/path/to/testing/dataset"

# this fails with the traceback below
interface = NeuralynxRecordingInterface(folder_path=folder_path, verbose=False)

Traceback

File "/home/kriarm/project/lm-ecog/src/seeg/convert_nwb.py", line 19, in <module>
    interface = NeuralynxRecordingInterface(folder_path=folder_path, verbose=False)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kriarm/miniconda3/envs/core/lib/python3.11/site-packages/neuroconv/datainterfaces/ecephys/neuralynx/neuralynxdatainterface.py", line 55, in __init__
    super().__init__(
  File "/home/kriarm/miniconda3/envs/core/lib/python3.11/site-packages/neuroconv/datainterfaces/ecephys/baserecordingextractorinterface.py", line 38, in __init__
    self.recording_extractor = self.get_extractor()(**source_data)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kriarm/miniconda3/envs/core/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neuralynx.py", line 38, in __init__
    NeoBaseRecordingExtractor.__init__(
  File "/home/kriarm/miniconda3/envs/core/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py", line 187, in __init__
    _NeoBaseExtractor.__init__(self, block_index, **neo_kwargs)
  File "/home/kriarm/miniconda3/envs/core/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py", line 27, in __init__
    self.neo_reader = self.get_neo_io_reader(self.NeoRawIOClass, **neo_kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kriarm/miniconda3/envs/core/lib/python3.11/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py", line 66, in get_neo_io_reader
    neo_reader.parse_header()
  File "/home/kriarm/miniconda3/envs/core/lib/python3.11/site-packages/neo/rawio/baserawio.py", line 189, in parse_header
    self._parse_header()
  File "/home/kriarm/miniconda3/envs/core/lib/python3.11/site-packages/neo/rawio/neuralynxrawio/neuralynxrawio.py", line 392, in _parse_header
    raise ValueError(
ValueError: Incompatible section structures across streams: Stream id 0:['LAHC1'] and 1:['LAHCu1'].

Operating System

Linux

Python Executable

Python

Python Version

3.11

Package Versions

requirements.txt

Code of Conduct

CodyCBakerPhD commented 1 week ago

Hi there, and thank you for the detailed issue! (and for the example data too!)

We'd be happy to propagate more options to downstream readers; but I would like to understand a bit more why neo doesn't detect these as different streams - usually, the conditions you describe here, should count as the definition of a distinct stream right @h-mayorquin?

The current workaround is to simply do conversion separately for groups of channels with different sampling rates (i.e. put them in separate folders).

Glad to hear there is at least some workaround available

If i understand correctly, would lead to several *.nwb files (one for each stream).

It could, but it doesn't have to - the overwrite argument of .run_conversion on all data interfaces and converters is False by default, meaning it tries to append the existing file at the nwbfile_path

However, to avoid collisions in object names being written to the same file, you might have to carefully control the metadata, such as the name, in each metadata["Ecephys"][es_key] (with es_key being specified when you initialize each recording interface)

I don't know if that's a desired behavior or if the NWB standard allows different sampling rates per dataset.

NWB can easily support multiple neurodata objects with different sampling rates; these simply become different ElectricalSeries objects in this case, each with their own rate and the data corresponding to that group of channels

h-mayorquin commented 1 week ago

@CodyCBakerPhD I will take a closer look but it looks like it differentiates them. Thanks a lot for sharing the data.

KristijanArmeni commented 1 week ago

Thanks @CodyCBakerPhD for feedback!

but I would like to understand a bit more why neo doesn't detect these as different streams

I don't know Neo and their idea of streams well enough, but it indeed seems reading data only works on a single stream. At least that's what I got from this issue in the neo repo: https://github.com/NeuralEnsemble/python-neo/issues/1292

If I understand things correctly, that would also mean that the stream_name param in NeuralynxRecordingInterface (and any of the .get_streams* methods that use it) might not be useful if this information is ultimately obtained via neo_reader.parse_header() which always expects a single stream.

NWB can easily support multiple neurodata objects

Thanks! That's awesome and good to know.

CodyCBakerPhD commented 1 week ago

I don't know Neo and their idea of streams well enough, but it indeed seems reading data only works on a single stream. At least that's what I got from this issue in the neo repo: https://github.com/NeuralEnsemble/python-neo/issues/1292

If I understand things correctly, that would also mean that the stream_name param in NeuralynxRecordingInterface (and any of the .get_streams* methods that use it) might not be useful if this information is ultimately obtained via neo_reader.parse_header() which always expects a single stream.

That's all accurate and in line with our usage here - what I mean to say is that we would then associate a single instance of a DataInterface with each individual stream that is detected by neo (and which you get see via the classmethod NeuralynxRecordingInterface.get_stream_names)

So do include all streams for a multi-stream instance of Neuralynx in your NWB conversion, just make several such interface, one for each stream, then bundle them in a converter object to create a single NWB file (+/- naming of things via metadata to avoid certain collisions)

The impression I was getting however is that neo wasn't properly detecting some of your channel groups as different streams and you needed the additional argument to handle that - is that the case or does neo identify all the streams you need?

KristijanArmeni commented 1 week ago

The impression I was getting however is that neo wasn't properly detecting some of your channel groups as different streams and you needed the additional argument to handle that - is that the case or does neo identify all the streams you need?

Yes, I think that's true. Well, from the error message (see below) it seems that neo does identify two streams, but complains if the two streams have different sampling rates. I am not sure if a stream always implies a specific sampling rate or if different streams can also have the same sampling rate (in which case this might have worked).

Say, if I try the same folder as above:

./nwb_bug
├── LAHC1.ncs #2kHz
└── LAHCu1.ncs #32kHz

Then doing this:

from neo.rawio import NeuralynxRawIO

reader = NeuralynxRawIO(dirname="/new_bug") # reader.header attr is None
reader.parse_header()  # i think this has to be called to get stream info

Will give the same error message:

Cell In[9], line 1
----> 1 reader.parse_header()

File ~/miniconda3/envs/core/lib/python3.11/site-packages/neo/rawio/baserawio.py:189, in BaseRawIO.parse_header(self)
    176 """
    177 Parses the header of the file(s) to allow for faster computations
    178 for all other functions
    179 
    180 """
    181 # this must create
    182 # self.header['nb_block']
    183 # self.header['nb_segment']
   (...)
    186 # self.header['spike_channels']
    187 # self.header['event_channels']
--> 189 self._parse_header()
    190 self._check_stream_signal_channel_characteristics()
    191 self.is_header_parsed = True

File ~/miniconda3/envs/core/lib/python3.11/site-packages/neo/rawio/neuralynxrawio/neuralynxrawio.py:392, in NeuralynxRawIO._parse_header(self)
    389         ref_chan_ids = signal_channels[signal_channels["stream_id"] == ref_stream_id]["name"]
    390         chan_ids = signal_channels[signal_channels["stream_id"] == stream_id]["name"]
--> 392         raise ValueError(
    393             "Incompatible section structures across streams: "
    394             f"Stream id {ref_stream_id}:{ref_chan_ids} and "
    395             f"{stream_id}:{chan_ids}."
    396         )
    398 if ref_sec_structure is not None:
    399     self._nb_segment = len(ref_sec_structure.sects)

ValueError: Incompatible section structures across streams: Stream id 0:['LAHC1'] and 1:['LAHCu1'].
CodyCBakerPhD commented 1 week ago

Hmm that sounds like a neo bug to me then, have you raised an issue over there?

Ideally their tools would work directly on source data as it came off the acquisition system unmodified by any subsequent opreations

KristijanArmeni commented 1 week ago

I haven't raised an issue myself, but I saw this: https://github.com/NeuralEnsemble/python-neo/issues/1292

Rereading it, it seems it might not have to do with different sampling rates, but differences in data segments across two streams (for whatever reason). So, exclude_filename arg is there to deal with this.

CodyCBakerPhD commented 1 week ago

I'd go ahead and let them know about it since you have example data so readily available; it might be a segment issue or it might not? Either way sounds like they need reasons to adjust the header parsing to allow stream detection

We try not to use backdoors too often unless the alternative is very costly

zm711 commented 1 week ago

Hey just as a warning we are working on regularizing our Neuralynx parameters. See here. I've given it a review, but I'm awaiting one other developer to also read through. So you may want to update with that context in mind.