TheChymera / neuralynx_nwb

Neuralynx to NWB converstion scripts (ideally to be upstreamed)
0 stars 0 forks source link

Use spikeinterfaces #2

Open TheChymera opened 1 year ago

TheChymera commented 1 year ago

version bump package for spikeinterfaces, then use this object:

from spikeinterface.extractors import NeuralynxRecordingExtractor

obj = NeuralynxRecordingExtractor("dir_path")
CodyCBakerPhD commented 1 year ago

This also enables you to use NeuroConv tools to reduce the amount of code you need to maintain for this.

All the NeuroConv tools also come equipped with iterative write and automated compression (default is just "gzip" level 4), both of which are highly recommended for working with data pipelines that could ever exceed ~10 GB or so.

If you encounter any questions or issues with NeuroConv, please feel free to let us know.

Highest level of NeuroConv would be to use the NeuralynxDataInterface

To create an example NWB file showing the default structure and automatically extracted metadata when converting from Neuralynx I'd just modify that showcase to

from datetime import datetime
from dateutil import tz
from pathlib import Path
from neuroconv.datainterfaces import NeuralynxRecordingInterface

folder_path = f".../neuralynx/Cheetah_v5.7.4/original_data"
interface = NeuralynxRecordingInterface(folder_path=folder_path, verbose=False)

# Extract what metadata we can from the source files
metadata = interface.get_metadata()
# session_start_time is required for conversion. If it cannot be inferred
# automatically from the source files you must supply one.
session_start_time = datetime(2020, 1, 1, 12, 30, 0, tzinfo=tz.gettz("US/Pacific"))
metadata["NWBFile"].update(session_start_time=session_start_time)  # other key/value pairs of this dict are passed directly to pynwb.NWBFile(...)

interface.run_conversion(nwbfile_path="save/to/path.nwb", metadata=metadata, stub_test=True)  # stub_test for fast testing

The interface use SpikeInterface under the hood, but the point is to make the users life as easy as possible by just taking as input the path, and as output the NWB file.

For more greater customization or control, see below.

At a lower level of direct interaction with SpikeInterface, use our SpikeInterface conversion tools

In particular, you ought to be able to just call

from neuroconv.tools.spikeinterface import write_recording
from spikeinterface.extractors import NeuralynxRecordingExtractor

recording = NeuralynxRecordingExtractor(folder_path="your/folder/path")

write_recording(recording=recording, nwbfile_path="/save/to/path.nwb")

Or if you want to pass in an existing nwbfile in-memory, calling

from neuroconv.tools.spikeinterface import add_electrical_series
from spikeinterface.extractors import NeuralynxRecordingExtractor

recording = NeuralynxRecordingExtractor(folder_path="your/folder/path")

add_electrical_series(recording=recording, nwbfile=nwbfile)

will modify it in-place.

CodyCBakerPhD commented 1 year ago

After looking into this a bit more on our side, I'd strongly recommend try to use the DataInterface approach.

We recently refactored the metadata for that interface so that we utilize as much as possible from the source files: https://github.com/catalystneuro/neuroconv/blob/main/tests/test_on_data/test_metadata/test_neuralynx_metadata.py

TheChymera commented 1 year ago

@CodyCBakerPhD is the object which I would be passing to write_recording() the interface itself?

Also, is there a way to handle the streams automatically?

[deco]~/src/neuralynx_nwb ❱ python -c "from neuralynx_nwb import newconvert; newconvert.reposit_data()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/chymera/src/neuralynx_nwb/neuralynx_nwb/newconvert.py", line 23, in reposit_data
    session_dir = os.path.join(data_dir, data_selection)
  File "/usr/lib/python3.10/site-packages/neuroconv/datainterfaces/ecephys/neuralynx/neuralynxdatainterface.py", line 17, in __init__
    super().__init__(folder_path=folder_path, verbose=verbose, all_annotations=True)
  File "/usr/lib/python3.10/site-packages/neuroconv/datainterfaces/ecephys/baserecordingextractorinterface.py", line 28, in __init__
    self.recording_extractor = self.Extractor(**source_data)
  File "/usr/lib/python3.10/site-packages/spikeinterface/extractors/neoextractors/neuralynx.py", line 29, in __init__
    NeoBaseRecordingExtractor.__init__(self, stream_id=stream_id,
  File "/usr/lib/python3.10/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py", line 59, in __init__
    raise ValueError(f"This reader have several streams: \nNames: {stream_names}\nIDs: {stream_ids}. "
ValueError: This reader have several streams:
Names: ['Stream (rate,#packet,t0): (32000.0, 253845, 1626434026469696)', 'Stream (rate,#packet,t0): (32000.0, 253845, 1626434026469680)', 'Stream (rate,#packet,t0): (2000.0, 15866, 1626434026470148)']
IDs: ['0', '1', '2']. Specify it with the 'stram_name' or 'stream_id' arguments
[deco]~/src/neuralynx_nwb ❱ cat neuralynx_nwb/newconvert.py
import os
from neuroconv.datainterfaces import NeuralynxRecordingInterface
from neuroconv.tools.spikeinterface import write_recording

def reposit_data(
    data_dir='~/.local/share/datalad/',
    data_selection='vStr_phase_stim/M235/M235-2021-07-16/',
    lab_name='MVDMLab',
    institution='Dartmouth College',
    keywords=[
        'DANDI Pilot',
        ],
    experimenter='Manish Mohapatra',
    experiment_description='...',
    debug=True,
    session_description='Extracellular ephys recording in the ventral Striatum',
    keep_original_times=True,
    output_filename='neuralynx_nwb_testfile',
    ):

    data_dir = os.path.abspath(os.path.expanduser(data_dir))
    session_dir = os.path.join(data_dir, data_selection)
    interface = NeuralynxRecordingInterface(folder_path=session_dir, verbose=False)
CodyCBakerPhD commented 1 year ago

@TheChymera Sorry for the delayed repsonse, I've been away at a conference

object which I would be passing to write_recording() the interface itself?

I updated my code above to make the usage more obvious

Also, is there a way to handle the streams automatically?

Drat, doesn't look like we expose that at the interface level yet. Past testing data may have only had a single stream

The NeuralynxRecordingExtractor however does, I believe it's named stream_name (and one of those long names it lists in the message or just the stream_id (one of the 3 integers)

I do wish those streams had better names; do you know what each stream represents? Especially that one with 2kHz rate?

P.S.: I don't have access to DANDI slack ATM but it notified me that you messaged a question - as I recall, "icephys" = "intracellular electrophysiology" (patch clamp, etc) and "ecephys" = "extracellular electrophysiology" (electrodes from probes/arrays outside the cell)

TheChymera commented 1 year ago

@CodyCBakerPhD one of the streams is a wheel on which the mouse moves. The other two are “raw” and thresholded data. The 2kHz one is the wheel one. @manimoh just to double check, my description here is correct, yes?

CodyCBakerPhD commented 1 year ago

That would make sense then

I'm not sure how to disambiguate the raw vs. thresholded streams since they have the type of signature, but maybe you can find some property of the data that can distinguish it

The thresholded data would be included in the 'processed' submodule using the extra key/value write_as="processed" in write_recording

The wheel series will need to written manually using a pynwb.TimeSeries - do you have a way to map the aux channel Volt signal (assuming that's how it's stored) into turns/radians/other physical units?

TheChymera commented 1 year ago

@CodyCBakerPhD well, I'll have to think about stream auto-detection, but for now, should I try to use neuroconv or spikeinterface? I'm still not sure how to write object using from neuroconv.datainterfaces import NeuralynxRecordingInterface. In your example above it seems like the reader interface is only used for the metadata. Or am I missing something?

CodyCBakerPhD commented 1 year ago

should I try to use neuroconv or spikeinterface?

As I mention, the NeuralynxRecordingInterface doesn't expose stream_id or stream_name yet, can make a fix but would take a little bit.

Interfaces do the same thing as described in the section entitled 'At a lower level of direct interaction with SpikeInterface...' from the above comment, plus a bit of extra metadata stuff.

But for an initial draft conversion, fine to just try the tools to see how it looks in the file and adjust from there