Open TheChymera opened 1 year ago
This also enables you to use NeuroConv tools to reduce the amount of code you need to maintain for this.
All the NeuroConv tools also come equipped with iterative write and automated compression (default is just "gzip" level 4), both of which are highly recommended for working with data pipelines that could ever exceed ~10 GB or so.
If you encounter any questions or issues with NeuroConv, please feel free to let us know.
To create an example NWB file showing the default structure and automatically extracted metadata when converting from Neuralynx I'd just modify that showcase to
from datetime import datetime
from dateutil import tz
from pathlib import Path
from neuroconv.datainterfaces import NeuralynxRecordingInterface
folder_path = f".../neuralynx/Cheetah_v5.7.4/original_data"
interface = NeuralynxRecordingInterface(folder_path=folder_path, verbose=False)
# Extract what metadata we can from the source files
metadata = interface.get_metadata()
# session_start_time is required for conversion. If it cannot be inferred
# automatically from the source files you must supply one.
session_start_time = datetime(2020, 1, 1, 12, 30, 0, tzinfo=tz.gettz("US/Pacific"))
metadata["NWBFile"].update(session_start_time=session_start_time) # other key/value pairs of this dict are passed directly to pynwb.NWBFile(...)
interface.run_conversion(nwbfile_path="save/to/path.nwb", metadata=metadata, stub_test=True) # stub_test for fast testing
The interface use SpikeInterface under the hood, but the point is to make the users life as easy as possible by just taking as input the path, and as output the NWB file.
For more greater customization or control, see below.
In particular, you ought to be able to just call
from neuroconv.tools.spikeinterface import write_recording
from spikeinterface.extractors import NeuralynxRecordingExtractor
recording = NeuralynxRecordingExtractor(folder_path="your/folder/path")
write_recording(recording=recording, nwbfile_path="/save/to/path.nwb")
Or if you want to pass in an existing nwbfile
in-memory, calling
from neuroconv.tools.spikeinterface import add_electrical_series
from spikeinterface.extractors import NeuralynxRecordingExtractor
recording = NeuralynxRecordingExtractor(folder_path="your/folder/path")
add_electrical_series(recording=recording, nwbfile=nwbfile)
will modify it in-place.
After looking into this a bit more on our side, I'd strongly recommend try to use the DataInterface approach.
We recently refactored the metadata for that interface so that we utilize as much as possible from the source files: https://github.com/catalystneuro/neuroconv/blob/main/tests/test_on_data/test_metadata/test_neuralynx_metadata.py
@CodyCBakerPhD is the object which I would be passing to write_recording()
the interface itself?
Also, is there a way to handle the streams automatically?
[deco]~/src/neuralynx_nwb ❱ python -c "from neuralynx_nwb import newconvert; newconvert.reposit_data()"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/chymera/src/neuralynx_nwb/neuralynx_nwb/newconvert.py", line 23, in reposit_data
session_dir = os.path.join(data_dir, data_selection)
File "/usr/lib/python3.10/site-packages/neuroconv/datainterfaces/ecephys/neuralynx/neuralynxdatainterface.py", line 17, in __init__
super().__init__(folder_path=folder_path, verbose=verbose, all_annotations=True)
File "/usr/lib/python3.10/site-packages/neuroconv/datainterfaces/ecephys/baserecordingextractorinterface.py", line 28, in __init__
self.recording_extractor = self.Extractor(**source_data)
File "/usr/lib/python3.10/site-packages/spikeinterface/extractors/neoextractors/neuralynx.py", line 29, in __init__
NeoBaseRecordingExtractor.__init__(self, stream_id=stream_id,
File "/usr/lib/python3.10/site-packages/spikeinterface/extractors/neoextractors/neobaseextractor.py", line 59, in __init__
raise ValueError(f"This reader have several streams: \nNames: {stream_names}\nIDs: {stream_ids}. "
ValueError: This reader have several streams:
Names: ['Stream (rate,#packet,t0): (32000.0, 253845, 1626434026469696)', 'Stream (rate,#packet,t0): (32000.0, 253845, 1626434026469680)', 'Stream (rate,#packet,t0): (2000.0, 15866, 1626434026470148)']
IDs: ['0', '1', '2']. Specify it with the 'stram_name' or 'stream_id' arguments
[deco]~/src/neuralynx_nwb ❱ cat neuralynx_nwb/newconvert.py
import os
from neuroconv.datainterfaces import NeuralynxRecordingInterface
from neuroconv.tools.spikeinterface import write_recording
def reposit_data(
data_dir='~/.local/share/datalad/',
data_selection='vStr_phase_stim/M235/M235-2021-07-16/',
lab_name='MVDMLab',
institution='Dartmouth College',
keywords=[
'DANDI Pilot',
],
experimenter='Manish Mohapatra',
experiment_description='...',
debug=True,
session_description='Extracellular ephys recording in the ventral Striatum',
keep_original_times=True,
output_filename='neuralynx_nwb_testfile',
):
data_dir = os.path.abspath(os.path.expanduser(data_dir))
session_dir = os.path.join(data_dir, data_selection)
interface = NeuralynxRecordingInterface(folder_path=session_dir, verbose=False)
@TheChymera Sorry for the delayed repsonse, I've been away at a conference
object which I would be passing to write_recording() the interface itself?
I updated my code above to make the usage more obvious
Also, is there a way to handle the streams automatically?
Drat, doesn't look like we expose that at the interface level yet. Past testing data may have only had a single stream
The NeuralynxRecordingExtractor
however does, I believe it's named stream_name
(and one of those long names it lists in the message or just the stream_id
(one of the 3 integers)
I do wish those streams had better names; do you know what each stream represents? Especially that one with 2kHz rate?
P.S.: I don't have access to DANDI slack ATM but it notified me that you messaged a question - as I recall, "icephys" = "intracellular electrophysiology" (patch clamp, etc) and "ecephys" = "extracellular electrophysiology" (electrodes from probes/arrays outside the cell)
@CodyCBakerPhD one of the streams is a wheel on which the mouse moves. The other two are “raw” and thresholded data. The 2kHz one is the wheel one. @manimoh just to double check, my description here is correct, yes?
That would make sense then
I'm not sure how to disambiguate the raw vs. thresholded streams since they have the type of signature, but maybe you can find some property of the data that can distinguish it
The thresholded data would be included in the 'processed' submodule using the extra key/value write_as="processed"
in write_recording
The wheel series will need to written manually using a pynwb.TimeSeries
- do you have a way to map the aux channel Volt signal (assuming that's how it's stored) into turns/radians/other physical units?
@CodyCBakerPhD well, I'll have to think about stream auto-detection, but for now, should I try to use neuroconv
or spikeinterface
? I'm still not sure how to write object using from neuroconv.datainterfaces import NeuralynxRecordingInterface
. In your example above it seems like the reader interface is only used for the metadata. Or am I missing something?
should I try to use neuroconv or spikeinterface?
As I mention, the NeuralynxRecordingInterface
doesn't expose stream_id
or stream_name
yet, can make a fix but would take a little bit.
Interfaces do the same thing as described in the section entitled 'At a lower level of direct interaction with SpikeInterface...' from the above comment, plus a bit of extra metadata stuff.
But for an initial draft conversion, fine to just try the tools
to see how it looks in the file and adjust from there
version bump package for spikeinterfaces, then use this object: