gin data for CED has non-electrical series data

h-mayorquin commented 2 years ago

Hi, I am looking at the gin data for CED and I found the following:

# This generates an error
from pathlib import Path
from spikeinterface.extractors import CedRecordingExtractor

DATA_PATH = Path("/home/heberto/ephy_testing_data/")
file_path = DATA_PATH / "spike2" / "m365_1sec.smrx"
recorder = CedRecordingExtractor(file_path=file_path, stream_id=None)

recorder.neo_reader.header["signal_channels"][-10:]

The output is:

array([('RhdD-59', '62', 30030.03003003, 'int16', 'mV', 1.95312500e-04, 0.  , '0'),
       ('RhdD-60', '63', 30030.03003003, 'int16', 'mV', 1.95312500e-04, 0.  , '0'),
       ('RhdD-61', '64', 30030.03003003, 'int16', 'mV', 1.95312500e-04, 0.  , '0'),
       ('RhdD-62', '65', 30030.03003003, 'int16', 'mV', 1.95312500e-04, 0.  , '0'),
       ('RhdD-63', '66', 30030.03003003, 'int16', 'mV', 1.95312500e-04, 0.  , '0'),
       ('CED_Mech', '67', 30030.03003003, 'int16', 'g', 2.04467773e-03, 0.  , '0'),
       ('LFP', '68', 30030.03003003, 'int16', 'V', 5.03540039e-05, 1.65, '0'),
       ('MechTTL', '70', 30030.03003003, 'int16', 'V', 5.03540039e-05, 1.65, '0'),
       ('MechStim', '71', 30030.03003003, 'int16', 'V', 5.03540039e-05, 1.65, '0'),
       ('Laser', '72', 30030.03003003, 'int16', 'V', 5.03540039e-05, 1.65, '0')],
      dtype=[('name', '<U64'), ('id', '<U64'), ('sampling_rate', '<f8'), ('dtype', '<U16'), ('units', '<U64'), ('gain', '<f8'), ('offset', '<f8'), ('stream_id', '<U64')])

So, some of the channels are not electrical series. We have events, laser and mechanical stimulation. Plus, channel 68 is LFP. It appears to me that these channels should not be written as an electrical series but I don't know how standard the channel naming is and how to deal with it. Maybe someone who knows the format more could advice.

CodyCBakerPhD commented 2 years ago

The project that used this data can be found here for reference on how to map this to NWB: https://github.com/catalystneuro/mease-lab-to-nwb/blob/main/mease_lab_to_nwb/convert_ced/cedstimulusinterface.py

Now, if you're saying that the new neo/SI version loads ALL those channels, then yeah we definitely need to restore the smrx_channel_indices argument to specify (mainly through peeling off using a SubRecording), through we ought to at least support a default of writing all the RhD channels which are clearly neurophysiology recordings of the classic sense.

Now, all of these are technically 'ElectricalSeries' since they are in voltages, but they wouldn't have linked pynwb.Electrodes (only for ecephys channels) and as such are added as pynwb.TimeSeries with 'V' units.

h-mayorquin commented 2 years ago

The project that used this data can be found here for reference on how to map this to NWB: https://github.com/catalystneuro/mease-lab-to-nwb/blob/main/mease_lab_to_nwb/convert_ced/cedstimulusinterface.py

Now, if you're saying that the new neo/SI version loads ALL those channels, then yeah we definitely need to restore the smrx_channel_indices argument to specify (mainly through peeling off using a SubRecording), through we ought to at least support a default of writing all the RhD channels which are clearly neurophysiology recordings of the classic sense.

Yes, neo version loads ALL those channels, but so did the old version without the smrx_channel_ids argument. The problem, from my perspective, is to know which channels belong to which class. If the code that is written here is general (in the sense that Cambridge Electronic Design consistently uses those names):

https://github.com/catalystneuro/mease-lab-to-nwb/blob/37a39fabce7baa50551c91258a4b028a1980d024/mease_lab_to_nwb/convert_ced/cednwbconverter.py#L55-L59

Then we could just implement a binary variable that either loads the electrical readings or the stimuli.

Now, all of these are technically 'ElectricalSeries' since they are in voltages, but they wouldn't have linked pynwb.Electrodes (only for ecephys channels) and as such are added as pynwb.TimeSeries with 'V' units.

I don't think this is correct. One of them (channel 67) has units of grams.

CodyCBakerPhD commented 2 years ago

(in the sense that Cambridge Electronic Design consistently uses those names):

I believe the user controls those channel names, so not entirely reliable. But anything with an Rhd text in the name is definitely a neural electrode (intan board/probe).

Yes, neo version loads ALL those channels, but so did the old version without the smrx_channel_ids argument.

OK so we just have to split them at the data interface level, then. My point being I don't think the design of the class method for getting channels ids and then passing them into the __init__ as smrx_channel_ids was the best way to go. Perhaps a post-__init__ function for deciding, after the extractor is initialized, how to subset the channels for neural data? And this could be done without user intervention when the neural channels are of a form we can identify (like Rhd). The Stim channels would always rely on a secondary interface regardless so our primary CED interface doesn't need to deal with them.

I don't think this is correct. One of them (channel 67) has units of grams.

Good eye - CED_mech is definitely not electrical!

catalystneuro / neuroconv

gin data for CED has non-electrical series data #8