catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
50 stars 21 forks source link

[Feature]: Output to BIDS #750

Open TheChymera opened 7 months ago

TheChymera commented 7 months ago

What would you like to see added to NeuroConv?

Would you be interested in having neuroconv output BIDS-formatted NWB files? NWB is already supported as a format by BIDS.

Is your feature request related to a problem?

No response

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

CodyCBakerPhD commented 7 months ago

How do you mean? Like a 'dandi organize' helper function to name files in folder in a particular way?

Are there existing tools/libraries for that so we don't have to be the ones to maintain it (again, like dandi organize)

TheChymera commented 7 months ago

Yes, dandi organize was the original idea, but now that I see the first draft in action, it seems like a strange workflow.

Basically DANDI is just a user of NWB and BIDS, not a provider of NWB/BIDS tools (nwbinspector and bidsschematools are also not part of DANDI), so why then would BIDS formatting of NWB files be contingent on DANDI?

By analogy to e.g. MRI workflows, usually -to-NIfTI converters offer the capability to either create BIDS-NIfTI, or just output NIfTI files with whatever names. In the case of NWB, the top-level conversion package is neuroconv, so it would make a lot of sense to offer the option for BIDS-formattable output in addition to just a user-specified string for the file.

Since it's still early enough I could move the feature PR from DANDI to here. The up-side is that neuroconv could then advertise that it can comply with BIDS.

CodyCBakerPhD commented 7 months ago

I agree that separation of concerns is a good idea - but passing the buck to NeuroConv specifically may not be the best location for all that logic

Input to NeuroConv is source data from proprietary formats (files/folders in structure of those formats) and output, by default, NWB files at whatever paths with whatever structure the user specifies. The basic recommendation (and tests) is to output as a flat collection of uniquely named NWB files to make it easy for dandi organize to be run on the collection

I'd be happy to add convenience wrappers/options/instructions here to make it easy for someone to take their flat list of NWB files created through NeuroConv and organize them into DANDI or BIDS (likely as an extra option to the automatic_dandi_upload function, which is a light wrapper around DANDI API), but I wouldn't want the thorough documentation, testing, etc. of that specific functionality to live here

I would advocate for the creation of a specific standalone package for file organization strategies that can be used as a modular step in the workflow after creation of the NWB files and before DANDI upload. It's own dependency stack would also benefit tremendously from that isolation away from NeuroConv (e.g., what about people making their own files without NeuroConv?)

TheChymera commented 7 months ago

the creation of a specific standalone package

There is already a package which specifically deals with the sort of data we're looking to reformat, https://github.com/INT-NIT/BEP032tools/ , we could indeed extend that. Though BIDS supports NWB even outside of BEP032, so in the end it might be best to just have another package. Not sure how long-term-maintenance would look on that.

I'd be happy to add convenience wrappers/options/instructions here to make it easy for someone to take their flat list of NWB files created through NeuroConv and organize them into DANDI or BIDS

This would be a good solution if you really don't want BIDS-reformatting functionality mixed in your code.

CodyCBakerPhD commented 7 months ago

This would be a good solution if you really don't want BIDS-reformatting functionality mixed in your code.

To be clear, I'd be happy to include the functionality; I just don't want to be the one responsible for maintaining the code and keeping it up to date with BIDS standard (unless this gets negotiated in the upcoming DANDI renewal scope, so we'll see). We have enough to maintain here as it is lol

Would appreciate a minimal example of how to call/use whatever function you're planning to extend whenever it's available

TheChymera commented 7 months ago

@CodyCBakerPhD ok, I have a draft package for this → https://github.com/con/nwb2bids

I'd like to set up a test for it, and I'd like to have a “blank” NWB file (with all the metadata, but lengt-0 data arrays) to test this on... but I can't seem to generate one from my data:

[deco]/tmp ❱ cat lala.py
import numpy as np
from pynwb import NWBHDF5IO

in_file = '/home/chymera/.local/share/mvmnda/nwbdata/ses-20231120114021_sub-M388_2024-02-28T04:33:14.442236.nwb'
out_file = '/tmp/mimio/lala.nwb'

with NWBHDF5IO(in_file, 'r') as io:
    nwbfile = io.read()

for acq_name, acq_data in nwbfile.acquisition.items():
    if 'data' in acq_data.fields:
        empty_acq_data = nwbfile.acquisition[acq_name].copy()
        empty_acq_data.set_data(np.empty(shape=(0,), dtype=acq_data.data.dtype))
        nwbfile.acquisition[acq_name] = empty_acq_data

with NWBHDF5IO(out_file, 'w') as io:
    io.write(nwbfile)
[deco]/tmp ❱ python lala.py
Traceback (most recent call last):
  File "/tmp/lala.py", line 12, in <module>
    empty_acq_data = nwbfile.acquisition[acq_name].copy()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ElectricalSeries' object has no attribute 'copy'

Do you maybe have some empty example data files somewhere?

CodyCBakerPhD commented 7 months ago

I'd like to set up a test for it, and I'd like to have a “blank” NWB file (with all the metadata, but lengt-0 data arrays) to test this on... but I can't seem to generate one from my data:

The specific error there comes from the ElectricalSeries method not having a copy method - maybe you meant to run it on the .data of the object instead?

Otherwise I got the following to work

import pynwb
import numpy
from pynwb.testing.mock.file import mock_NWBFile

nwbfile = mock_NWBFile()
time_series = pynwb.TimeSeries(name="Test", data=numpy.array([], dtype="uint8"), unit="n.a.", rate=1.0)
nwbfile.add_acquisition(time_series)

with pynwb.NWBHDF5IO(path="./test_empty.nwb", mode="w") as io:
    io.write(nwbfile)
TheChymera commented 7 months ago

@CodyCBakerPhD thanks for the example code. I guess in a pinch I could go with that, but since the nwb2bids conversion focuses on handling metadata, I'd like to take one of the files I have, and just replace the data arrays from it with empty arrays. Is there any way to take a file and remove the “data” from it?

CodyCBakerPhD commented 7 months ago

Is there any way to take a file and remove the “data” from it?

I would not recommend this approach

It would also be more explicit/readable to specify the metadata on the test files rather than inheriting from an implicit source

CodyCBakerPhD commented 7 months ago

@TheChymera

A helper script to make some test ecephys NWB files using simulated data in a semi-realistic manner (SpikeGLX inspired)

If the synthetic recording/spiking is overkill for your purposes we can also just use the pynwb.testing.mock library to generate dummy file structures

import datetime
import tempfile

import neuroconv
import spikeinterface
from neuroconv.datainterfaces import PhySortingInterface
from neuroconv.tools.spikeinterface import add_recording
from spikeinterface.exporters import export_to_phy

# Define NeuroPixel-like values for sampling rates and conversion factors
duration_in_s = 0.5  # Shortened from ~3 for demo purposes. Can't be too short or else errors in extraction step
number_of_units = 10
number_of_channels = 385  # Have to include 'sync' channel to be proper SpikeGLX. TODO: artificiate sync pulses
ap_conversion_factor_to_uV = 2.34375
ap_sampling_frequency = 30_000.0
lf_sampling_frequency = 2_500.0
downsample_factor = int(ap_sampling_frequency / lf_sampling_frequency)

# Generate synthetic spiking and voltage traces with waveforms around them
artificial_ap_band, spiking = spikeinterface.generate_ground_truth_recording(
    durations=[duration_in_s],
    sampling_frequency=ap_sampling_frequency,
    num_channels=number_of_channels,
    dtype="float32",
    num_units=number_of_units,
    seed=0,  # Fixed seed for reproducibility
)
artificial_ap_band.set_channel_gains(gains=ap_conversion_factor_to_uV)
waveform_extractor = spikeinterface.extract_waveforms(recording=artificial_ap_band, sorting=spiking, mode="memory")
int16_artificial_ap_band = artificial_ap_band.astype(dtype="int16")

# Kind of messy that this intermediate step is needed, but can't figure out direct out to NWB
temporary_folder = tempfile.TemporaryDirectory()
export_to_phy(
    waveform_extractor=waveform_extractor, output_folder=temporary_folder.name, remove_if_exists=True, copy_binary=False
)
phy_interface = PhySortingInterface(folder_path=temporary_folder.name)

# Write to NWB
# This is the part where the BIDS or other organization comes into play?
# Or start from existing files and make symlinks to them?
metadata = dict(
    NWBFile=dict(session_id="testabc", session_start_time=datetime.datetime(1970, 1, 1).isoformat()),
    Subject=dict(subject_id="test123", species="Mus musculus", age="P30D", sex="O"),
)
nwbfile_path = "C:/Users/Raven/Downloads/test_data.nwb"
with neuroconv.tools.nwb_helpers.make_or_load_nwbfile(
    nwbfile_path=nwbfile_path, metadata=metadata, overwrite=True
) as nwbfile:
    add_recording(recording=int16_artificial_ap_band, nwbfile=nwbfile)
    phy_interface.add_to_nwbfile(nwbfile=nwbfile, metadata=phy_interface.get_metadata())
    # add_sorting(sorting=spiking, nwbfile=nwbfile)