NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
174 stars 85 forks source link

[Bug]: Suite2p NWB files #1667

Open jeromelecoq opened 1 year ago

jeromelecoq commented 1 year ago

What happened?

Not sure if it belongs here but suite2p seems to generate non-compliant NWB files.

Here is a copy of my validation errors. Some are due to me not adding the info but the GUI does not allow this directly.

0 CRITICAL

0.0 ./: check_unique_identifiers - 'NWBFile' object at location '/' Message: The identifier '../data' is used across the .nwb files: ['Rorb-IRES2-Cre_590168381_590168385.nwb', 'Rorb-IRES2-Cre_590168381_590565013.nwb']. The identifier of any NWBFile should be a completely unique value - we recommend using uuid4 to achieve this.

0.1 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_image_series_external_file_valid - 'TwoPhotonSeries' object at location '/acquisition/TwoPhotonSeries' Message: The external file '../data/merged_movie.h5' does not exist. Please confirm the relative location to the NWBFile.

0.2 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_subject_exists - 'NWBFile' object at location '/' Message: Subject is missing.

0.3 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_roi_response_series_dims - 'RoiResponseSeries' object at location '/processing/ophys/Neuropil/Neuropil' Message: The second dimension of data does not match the length of rois, but instead the first does. Data is oriented incorrectly and should be transposed.

0.4 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_roi_response_series_dims - 'RoiResponseSeries' object at location '/processing/ophys/Fluorescence/Fluorescence' Message: The second dimension of data does not match the length of rois, but instead the first does. Data is oriented incorrectly and should be transposed.

0.5 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_roi_response_series_dims - 'RoiResponseSeries' object at location '/processing/ophys/Deconvolved/Deconvolved' Message: The second dimension of data does not match the length of rois, but instead the first does. Data is oriented incorrectly and should be transposed.

1 BEST_PRACTICE_VIOLATION

1.6 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_data_orientation - 'RoiResponseSeries' object at location '/processing/ophys/Neuropil/Neuropil' Message: Data may be in the wrong orientation. Time should be in the first dimension, and is usually the longest dimension. Here, another dimension is longer.

1.7 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_data_orientation - 'RoiResponseSeries' object at location '/processing/ophys/Fluorescence/Fluorescence' Message: Data may be in the wrong orientation. Time should be in the first dimension, and is usually the longest dimension. Here, another dimension is longer.

1.8 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_data_orientation - 'RoiResponseSeries' object at location '/processing/ophys/Deconvolved/Deconvolved' Message: Data may be in the wrong orientation. Time should be in the first dimension, and is usually the longest dimension. Here, another dimension is longer.

2 BEST_PRACTICE_SUGGESTION

2.9 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_description - 'RoiResponseSeries' object at location '/processing/ophys/Neuropil/Neuropil' Message: Description ('no description') is a placeholder.

2.10 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_description - 'RoiResponseSeries' object at location '/processing/ophys/Fluorescence/Fluorescence' Message: Description ('no description') is a placeholder.

2.11 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_description - 'RoiResponseSeries' object at location '/processing/ophys/Deconvolved/Deconvolved' Message: Description ('no description') is a placeholder.

2.12 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_description - 'Images' object with name 'Backgrounds_0' Message: Description ('no description') is a placeholder.

2.13 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_description - 'GrayscaleImage' object with name 'meanImg' Message: Description is missing.

2.14 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_description - 'GrayscaleImage' object with name 'max_proj' Message: Description is missing.

2.15 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_description - 'GrayscaleImage' object with name 'Vcorr' Message: Description is missing.

2.16 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_description - 'TwoPhotonSeries' object at location '/acquisition/TwoPhotonSeries' Message: Description ('no description') is a placeholder.

2.17 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_small_dataset_compression - 'RoiResponseSeries' object at location '/processing/ophys/Neuropil/Neuropil' Message: data is not compressed. Consider enabling compression when writing a dataset.

2.18 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_small_dataset_compression - 'RoiResponseSeries' object at location '/processing/ophys/Fluorescence/Fluorescence' Message: data is not compressed. Consider enabling compression when writing a dataset.

2.19 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_small_dataset_compression - 'RoiResponseSeries' object at location '/processing/ophys/Deconvolved/Deconvolved' Message: data is not compressed. Consider enabling compression when writing a dataset.

2.20 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_experimenter_exists - 'NWBFile' object at location '/' Message: Experimenter is missing.

2.21 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_experiment_description - 'NWBFile' object at location '/' Message: Experiment description is missing.

2.22 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_institution - 'NWBFile' object at location '/' Message: Metadata /general/institution is missing.

2.23 Rorb-IRES2-Cre_590168381_590168385.nwb and 1 other file: check_keywords - 'NWBFile' object at location '/' Message: Metadata /general/keywords is missing.

Steps to Reproduce

Load a movie into suite2p, run segmentation and save the NWB file

Traceback

No response

Operating System

Linux

Python Executable

Python

Python Version

3.7

Package Versions

From my docker : RUN pip install -U --no-cache-dir \ pynwb==2.2.0 \ suite2p==0.11.1

Code of Conduct

bendichter commented 1 year ago

@jeromelecoq thanks for this. Best practices have come a long way since we built the NWB export years ago. Our latest way to write files from suite2p is write the default format and then use NeuroConv Suite2pSegmentationInterface. This method has the advantage of allowing you to easily combine suite2p data with other data, such as the raw TIFF stack and/or trial events

jeromelecoq commented 1 year ago

Can you clarify a little more? Suite2p does output NWB files. What should be given to NeuroConv? The NWB from suite2p or the binary files from suite2p?

bendichter commented 1 year ago

the binary files from suite2p

oruebel commented 1 year ago

Best practices have come a long way since we built the NWB export years ago. Our latest way to write files from suite2p is write the default format and then use NeuroConv

@bendichter does this mean we should engage with suite2p to update the write there?

rly commented 1 year ago

I encountered this last week when comparing suite2p and caiman support for NWB. Does it make sense for suite2p (and similar processing/analysis tools) to accept a non-NWB file and create an NWB file? suite2p (and similar tools) does not handle information about the full experiment. For example, it does not have information about the subject, device, experimenter, session ID, etc., so when it makes an NWB file, that file does not conform to best practices in several ways and is not really publishable or usable on its own. suite2p could put placeholders for those, but NWB really does not make it easy to change those placeholders once set. suite2p could ask the user for those, but that's a burden on suite2p to develop and maintain, just to support NWB output.

If you already have an NWB file with your raw data and experimental metadata, suite2p does not append to it, but creates its own NWB file, and merging NWB files is unfortunately not straightforward.

It seems to me that using neuroconv to read a processing tool's native output and create an NWB file that merges that output with experimental metadata required by neuroconv is a better strategy. OR the tool should simply append data to an input NWB file that should have all the experimental metadata present already. Tools should not create NWB files with only their output; that leads to the confusion above where the output is technically a valid NWB file but is an incomplete and not very useful NWB file.

jeromelecoq commented 1 year ago

From a user perspective, I think it is reasonable to ask that the ecosystem of tools work well together. If suite2p generates an NWB file it should be possible to push the data to Dandi eventually. In this particular case, I ran 40ish processing jobs and saved the NWB file from Suite2p in the hope to eventually push it to Dandi. Having to go back to the raw feels like the tool are not working well together.

I feel like suite2p could create a barebone NWB with fields that are to be filled later on before submission, if that is at all possible. Currently it is mostly failing validation because of transpose issues. I would not mind merging NWB files from different machine for example.

bendichter commented 1 year ago

@jeromelecoq

If suite2p generates an NWB file it should be possible to push the data to Dandi eventually.

Agreed. This integration was created long before the DANDI requirements were in place so we'll need to fix them. And we definitely need to fix any transpose issues.

@rly you make some good points.

It's certainly easier on our side to maintain conversion tools in NeuroConv so we can adjust them as requirements are made more stringent by DANDI and test them all of compliance with CI. All of the different format conversions share a lot of functions so that is easier than maintaining exports across many libraries. We will continue to do that, but there remains the question of whether suite2p should provide an option to export to NWB.

It makes sense for suite2p to offer the functionality of appending to an existing NWB file and it might be easier to build a DANDI-compliant. I think it also makes sense to have suite2p output just the processed data, though I agree it would take substantial effort to get this to the point where it would be able to support non-filler metadata for all the required fields. Maybe that's more work than it's worth when you multiply it out by all the analysis tools in the ecosystem. NeuroConv might be able to help with this. We could have it as an optional requirement and dynamically import some of our internal functions when an NWB export is requested, but that's a bit outside the original scope of NeuroConv

oruebel commented 1 year ago

It makes sense for suite2p to offer the functionality of appending to an existing NWB file and it might be easier to build a DANDI-compliant. I think it also makes sense to have suite2p output just the processed data, though I agree it would take substantial effort to get this to the point where it would be able to support non-filler metadata for all the required fields.

An alternative option is to, instead of appending to the source file, to export the source file to a new file such that the new file would only contain links to original file (rather an copying data) and then appending to that file. In this way the original file would not be modified while from a user perspective the new file would look like all data is in one file. Similarly, one could also use 3 files: 1) the input file with the raw data, 2) a file with just the processing results, and 3) a third file that would consists of just external links to put the data from the data files together.

This approach with external HDF5 links is already possible with current features, however, it may be worth discussing whether a simpler sub-filing approach may be needed to make the integration of processing tools easier. E.g., I could imagine that one could place subfiles (which would just add new data, e.g., new TimeSeries) in a subfolder along with a main NWB file to then merge the files dynamically on read, or have some other ways to break up and merge NWB files.

jeromelecoq commented 1 year ago

The following code fixes MOST of the critical errors.

import numpy as np
from pynwb import NWBHDF5IO
from pynwb.ophys import RoiResponseSeries
import h5py
from pynwb.file import Subject

# Open the NWB file for writing
with NWBHDF5IO('../old/Rorb-IRES2-Cre_590168381_590565013.nwb', 'r') as read_io:
    # Create a new NWB file object
    nwbfile = read_io.read()

    nwbfile.subject = Subject(
    subject_id="001",
    age="P100D/P130D",
    description="Rorb-IRES2-Cre",
    species="Mus musculus",
    sex="M",
    )

    nwbfile.acquisition.pop(nwbfile.acquisition['TwoPhotonSeries'].name)
    nwb_object = {}
    roi_series = {}
    for key in ['Neuropil', 'Deconvolved']:# 'Fluorescence']:
        nwb_object[key] = nwbfile.processing['ophys'][key]
        roi_series[key] = nwbfile.processing['ophys'][key][key]

        # Get the data array from a RoiResponseSeries object
        local_data = nwbfile.processing['ophys'][key][key].data[:]

        # Transpose the array
        transposed_local_data = np.transpose(local_data, (1, 0))

        nwb_object[key].roi_response_series.pop(roi_series[key].name)

        roi_series[key].reset_parent()

        for child in roi_series[key].children:
            child.reset_parent()   

        new_response_roi = RoiResponseSeries(
            name=roi_series[key].name,
            data=transposed_local_data,
            unit=roi_series[key].unit,
            rois=roi_series[key].rois,
            timestamps=roi_series[key].timestamps,
            rate=roi_series[key].rate,
            description=key
        )

        nwb_object[key].add_roi_response_series(new_response_roi)

    # Update the data in the RoiResponseSeries object

    # call the export method to write the modified NWBFile instance to a new file path.
    # the original file is not modified
    with NWBHDF5IO('Rorb-IRES2-Cre_590168381_590565013_new.nwb', mode='w') as export_io:
        export_io.export(src_io=read_io, nwbfile=nwbfile)

    with h5py.File('Rorb-IRES2-Cre_590168381_590565013_new.nwb', "a") as f:
            f["general"]["institution"] = "Allen Institute"                

I am left with one transpose on the fluorescence object that I can't correct. Whenever I create the file, it works, but it fails validation with the following error:


0  ERROR
========

0.0  Rorb-IRES2-Cre_590168381_590565013_new.nwb: During io.read() - <class 'TypeError'>: Key '2e2456e5-c167-412b-a2be-ca9f9ca74fe8' is already in this dict. Cannot reset items in a LabelledDict. - 'None' object with name 'None'
       Message: Traceback (most recent call last):
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/nwbinspector/nwbinspector.py", line 540, in inspect_nwb
    for inspector_message in run_checks(nwbfile=nwbfile, checks=checks):
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/nwbinspector/nwbinspector.py", line 562, in run_checks
    for nwbfile_object in nwbfile.objects.values():
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/pynwb/file.py", line 548, in objects
    self.all_children()
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/pynwb/file.py", line 537, in all_children
    self.__obj[n.object_id] = n
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/hdmf/utils.py", line 1025, in __setitem__
    raise TypeError("Key '%s' is already in this dict. Cannot reset items in a %s."
TypeError: Key '2e2456e5-c167-412b-a2be-ca9f9ca74fe8' is already in this dict. Cannot reset items in a LabelledDict.

@rly or @bendichter any clue?

oruebel commented 1 year ago

From the error, it looks like there are somehow to objects (e.g., TimeSeries) with the same object_id in the file. You could try adding a call to nwbfile.generate_new_id() before you call the export to reset all object ids and see if that fixes the issues. However, I think this error likely indicates that there is some issue with the script itself in that maybe some ROIResponseSeries is being duplicated in the file with the same object_id.

jeromelecoq commented 1 year ago

Thanks for this suggestion, this does not fix this error.

jeromelecoq commented 1 year ago

I am suspecting this is related to the same rois table being referenced across Neuropil, Fluorescence and Deconvolved


Fields:
  comments: no comments
  conversion: 1.0
  data: <HDF5 dataset "data": shape (0, 0, 0), type "|u1">
  description: no description
  dimension: <HDF5 dataset "dimension": shape (2,), type "<i8">
  external_file: <StrDataset for HDF5 dataset "external_file": shape (1,), type "|O">
  format: external
  imaging_plane: ImagingPlane pynwb.ophys.ImagingPlane at 0x5798663872
Fields:
  conversion: 1.0
  description: standard
  device: Microscope pynwb.device.Device at 0x5798664256
Fields:
  description: My two-photon microscope
  manufacturer: The best microscope manufacturer

  excitation_lambda: 600.0
  imaging_rate: 30.0
  indicator: GCaMP
  location: V1
  optical_channel: (
    OpticalChannel <class 'pynwb.ophys.OpticalChannel'>
  )
  unit: meters

  offset: 0.0
  rate: 30.0
  resolution: -1.0
  starting_frame: [0]
  starting_time: 0.0
  starting_time_unit: seconds
  unit: unknown

Neuropil pynwb.ophys.Fluorescence at 0x5798666848
Fields:
  roi_response_series: {
    Neuropil <class 'pynwb.ophys.RoiResponseSeries'>
  }

Neuropil pynwb.ophys.RoiResponseSeries at 0x5798667952
Fields:
  comments: no comments
  conversion: 1.0
  data: <HDF5 dataset "data": shape (1612, 114045), type "<f4">
  description: no description
  offset: 0.0
  rate: 30.0
  resolution: -1.0
  rois: rois <class 'hdmf.common.table.DynamicTableRegion'>
  starting_time: 0.0
  starting_time_unit: seconds
  unit: lumens

rois hdmf.common.table.DynamicTableRegion at 0x5798666800
    Target table: PlaneSegmentation pynwb.ophys.PlaneSegmentation at 0x5798667088

Deconvolved pynwb.ophys.Fluorescence at 0x5798665264
Fields:
  roi_response_series: {
    Deconvolved <class 'pynwb.ophys.RoiResponseSeries'>
  }

Deconvolved pynwb.ophys.RoiResponseSeries at 0x5798666512
Fields:
  comments: no comments
  conversion: 1.0
  data: <HDF5 dataset "data": shape (1612, 114045), type "<f4">
  description: no description
  offset: 0.0
  rate: 30.0
  resolution: -1.0
  rois: rois <class 'hdmf.common.table.DynamicTableRegion'>
  starting_time: 0.0
  starting_time_unit: seconds
  unit: lumens

rois hdmf.common.table.DynamicTableRegion at 0x5798666800
    Target table: PlaneSegmentation pynwb.ophys.PlaneSegmentation at 0x5798667088

Fluorescence pynwb.ophys.Fluorescence at 0x5798665504
Fields:
  roi_response_series: {
    Fluorescence <class 'pynwb.ophys.RoiResponseSeries'>
  }

Fluorescence pynwb.ophys.RoiResponseSeries at 0x5798667184
Fields:
  comments: no comments
  conversion: 1.0
  data: <HDF5 dataset "data": shape (1612, 114045), type "<f4">
  description: no description
  offset: 0.0
  rate: 30.0
  resolution: -1.0
  rois: rois <class 'hdmf.common.table.DynamicTableRegion'>
  starting_time: 0.0
  starting_time_unit: seconds
  unit: lumens

rois hdmf.common.table.DynamicTableRegion at 0x5798666800
    Target table: PlaneSegmentation pynwb.ophys.PlaneSegmentation at 0x5798667088
jeromelecoq commented 1 year ago

I am not sure how to reconstruct it while keeping this

jeromelecoq commented 1 year ago

I might just abandon this approach. This is too buggy Perhaps I will just use this from @rly https://github.com/rly/aibs-nwb1-to-nwb2/blob/main/append_suite2p.py

jeromelecoq commented 1 year ago

This code generates the same error upon validation:

0.0  Rorb-IRES2-Cre_590168381_590565013_merged.nwb: During io.read() - <class 'TypeError'>: Key '2e2456e5-c167-412b-a2be-ca9f9ca74fe8' is already in this dict. Cannot reset items in a LabelledDict. - 'None' object with name 'None'
       Message: Traceback (most recent call last):
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/nwbinspector/nwbinspector.py", line 540, in inspect_nwb
    for inspector_message in run_checks(nwbfile=nwbfile, checks=checks):
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/nwbinspector/nwbinspector.py", line 562, in run_checks
    for nwbfile_object in nwbfile.objects.values():
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/pynwb/file.py", line 548, in objects
    self.all_children()
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/pynwb/file.py", line 537, in all_children
    self.__obj[n.object_id] = n
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/hdmf/utils.py", line 1025, in __setitem__
    raise TypeError("Key '%s' is already in this dict. Cannot reset items in a %s."
TypeError: Key '2e2456e5-c167-412b-a2be-ca9f9ca74fe8' is already in this dict. Cannot reset items in a LabelledDict.
jeromelecoq commented 1 year ago

I think after I spent 1 days on this, I am concluding that suite2p NWB files are useless with regards to Dandi. Both Ryan and I tried to make valid NWB 2.0 files and they all fail validation.

Fixing this within suite2p repo is probably the only way. I am abandoning that track

CodyCBakerPhD commented 1 year ago

As a simpler solution, have you attempted to save the output of Suite2p in its native format and then use NeuroConv to write it (through append mode) to an existing NWB file? (or feel free to leverage the NWB file and subject metadata in NeuroConv to write a nice stand-alone NWB file)

https://neuroconv.readthedocs.io/en/main/conversion_examples_gallery/segmentation/suite2p.html

jeromelecoq commented 1 year ago

So luckily, I had kept these folders from Suite2p in my processing pipeline of those 40 experiments so I could try this out. I eventually tried to create those NWB files from Suite2p output using NeuroConv. That process was painless.

Here are the validation output:

0 CRITICAL

0.0 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_subject_exists - 'NWBFile' object at location '/' Message: Subject is missing.

1 BEST_PRACTICE_SUGGESTION

1.1 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_description - 'Device' object at location '/general/devices/Microscope' Message: Description is missing.

1.2 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_description - 'Images' object with name 'SegmentationImages' Message: Description ('no description') is a placeholder.

1.3 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_description - 'GrayscaleImage' object with name 'mean' Message: Description is missing.

1.4 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_description - 'GrayscaleImage' object with name 'correlation' Message: Description is missing.

1.5 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_experimenter_exists - 'NWBFile' object at location '/' Message: Experimenter is missing.

1.6 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_experiment_description - 'NWBFile' object at location '/' Message: Experiment description is missing.

1.7 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_institution - 'NWBFile' object at location '/' Message: Metadata /general/institution is missing.

1.8 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_keywords - 'NWBFile' object at location '/' Message: Metadata /general/keywords is missing.

1.9 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_column_binarycapability - 'PlaneSegmentation' object with name 'PlaneSegmentation' Message: Column 'Accepted' uses 'integers' but has binary values [0 1]. Consider making it boolean instead and renaming the column to start with 'is'; doing so will save 11.28KB.

1.10 Rorb-IRES2-Cre_590168381_590565013_neuroconv.nwb: check_column_binarycapability - 'PlaneSegmentation' object with name 'PlaneSegmentation' Message: Column 'Rejected' uses 'integers' but has binary values [0 1]. Consider making it boolean instead and renaming the column to start with 'is'; doing so will save 11.28KB.

So it looks like this should get on dandi.

This NWB is different than the one I was building my analysis on so I wanted to check what was different. Here it is: This: nwb.processing['ophys']['ImageSegmentation'].plane_segmentations['PlaneSegmentation']['pixel_mask'][indiv_roi]

Needed to be : nwb.processing['ophys']['ImageSegmentation'].plane_segmentations['PlaneSegmentation']['image_mask'][indiv_roi]

the ROI format was also different. Full images were stored instead of just the mask pixels.

This ['processing']['ophys']['Fluorescence']['Fluorescence']['data'] was transposed AND renamed to : ['processing']['ophys']['Fluorescence']['RoiResponseSeries']['data']

I think that was the core of it.

CodyCBakerPhD commented 1 year ago

So it looks like this should get on dandi.

We're definitely much closer - many fewer violations!

The one thing that DANDI will require that nothing else does is subject metadata, you can add information easily in the NeuroConv code by modifying the metadata dictionary

after the line about

metadata["NWBFile"].update(session_start_time=session_start_time)

just add

metadata["Subject"] = dict(
    subject_id="name of subject",
    age="P6W",  # the age of the subject in ISO format - if exact is unknown can give a range of the form P4W/P8W
    species="Mus musculus",  # assuming house mouse, simply specify the latin binomial of the subject species
    sex="M",  # other options are F (female) or U (unknown)
)

the ROI format was also different.

The mask type should be controllable - it's not a part of the snippet, but there is an extra argument that can be passed to the .run_conversion(...) method called mask_type, specifically I want to say

interface.run_conversion(..., mask_type="pixel")

should restore the behavior to use pixel masks instead of the default, which is an image mask

was transposed AND renamed to :

As I recall the transposition part is necessary for NWB Best Practices, confirmed in part by how that check was not raised in the report you printed above

For the object name... It is possible to change the name of the flourescence object through NeuroConv, but is the name RoiResponseSeries OK for you to adapt around?

If not I can try to put the instructions together for how to customize it

For any further issues with NeuroConv specifically, please let us know over on the official repo

jeromelecoq commented 1 year ago

On RoiResponseSeries, that is the name of the object type. If you look at these files, I noticed there are "Deconvolved" and "Neuropil" objects so it is akward to have a "RoiResponseSeries" at the same level, it is rather unspecific. I am not entirely sure what you package there, but I am guessing these are the standard DFF traces created by suite2p?

CodyCBakerPhD commented 1 year ago

Thanks for pointing that out - I've raised an issue on NeuroConv to discuss how best to resolve that confusion: https://github.com/catalystneuro/neuroconv/issues/378

I am not entirely sure what you package there, but I am guessing these are the standard DFF traces created by suite2p?

The good news is a properly packaged DfOverF trace would be within a DfOverF container - as such I assume that the value written there as a 'RoiResponseSeries' should be the baseline fluorescence

bendichter commented 1 year ago

@jeromelecoq We had a big discussion today about how to work with suite2p (and data analysis tools in general) that want to export NWB files. The first issue, which can be resolved fairly easily, is that some of the data is transposed, so we'll submit a PR to fix that right away.

There is a much more difficult issue though which is harder to fix. The problem is that in order to write RoiResponseSeries data, you need lots of required metadata about the imaging plane, optical channels, etc. Right now suite2p solves this by writing a bunch of default placeholder values, but this is not great because they are difficult to change after-the-fact. We have seen this same issue come up in other software such as IPFX. Writing data to NWB properly would require having an API where users specified all of the required metadata, which might be too high of a bar for software that just want a simple export option.

We were able to come up with a compromise. Tools like these should only be able to write directly to NWB files in append mode. This way, much of the required metadata is already in the file and does not need to be linked to. The user will need to add some additional metadata but it should be reasonable to add as kwargs to the export_nwb method.

If you want to write a file that is just processed data, you will need to export suite2p native files and then convert it with NeuroConv where there are tools for populating all of the necessary metadata properly.

This should account for the needs of:

What do you think of that?

jeromelecoq commented 1 year ago

Appending seems reasonable. I think that is the desired process with immutable files. The question that follows is which software would be the first one to create the original NWB? Suite2p does add the "meat" of the data in this case.
For two-photon imaging, I see a typical pipeline as :

Create NWB with basic session info > Add 2p imaging acquisition parameters/potentially add raw > Add segmentation > Add DFF and ROI pro-processing > Add stimuli metadata > Add Behavioral information/tracking > Add downstream analysis from the combination of the above.

Would NeuroConv create that first NWB?

jeromelecoq commented 1 year ago

Just a quick question, you can still edit some of these placeholder fields, right? Say for example you did a mistake in the data or gender of the mouse, do you have to start from scratch again?

bendichter commented 1 year ago

Would NeuroConv create that first NWB?

Yes, most likely, at least until we can get acquisition systems to output NWB files directly.

jeromelecoq commented 1 year ago

What are the metadata 2p imaging segmentation need to be filled in? I wonder if they are all relevant.

bendichter commented 1 year ago

You can still edit some of these placeholder fields, right? Say for example you did a mistake in the data or gender of the mouse, do you have to start from scratch again?

Yes, they are placeholders. Whether you can change placeholders easily depends a lot on particulars and is not very intuitive. Anything can be changed in h5py, but you risk creating an invalid NWB file and it is generally recommended to avoid doing that. In PyNWB you can change the values of Datasets, but data type of those datasets, only the values. Sometimes you can change the shape, but you would need to set the Dataset up with a flexible shape to begin with which is not the default behavior. You cannot change Attributes in PyNWB. Most of the metadata is stored as Attributes and is therefore not possible to change without "doing surgery" with h5py. You can add Group, Attributes, and Datasets without a problem in PyNWB.

jeromelecoq commented 1 year ago

You can still edit some of these placeholder fields, right? Say for example you did a mistake in the data or gender of the mouse, do you have to start from scratch again?

Yes, they are placeholders. Whether you can change placeholders easily depends a lot on particulars and is not very intuitive. Anything can be changed in h5py, but you risk creating an invalid NWB file and it is generally recommended to avoid doing that. In PyNWB you can change the values of Datasets, but data type of those datasets, only the values. Sometimes you can change the shape, but you would need to set the Dataset up with a flexible shape to begin with which is not the default behavior. You cannot change Attributes in PyNWB. Most of the metadata is stored as Attributes and is therefore not possible to change without "doing surgery" with h5py. You can add Group, Attributes, and Datasets without a problem in PyNWB.

Perhaps it might be worth allowing to modify string types or singleton dimension fields. It is very common for some critical piece of metadata like gender, mouse age, experimental date or such to be wrongly entered or added later.

bendichter commented 1 year ago

What are the metadata 2p imaging segmentation need to be filled in? I wonder if they are all relevant.

names and descriptions of various datasets, emission and excitation wavelengths, brain areas, indicator, grid spacing of acquisition image, and a few other fields. The metadata schema from neuroconv is quite long, but most of this is optional:

interface = Suite2pSegmentationInterface(folder_path="neuroconv_testing_data/ophys_testing_data/segmentation_datasets/suite2p")
interface.get_metadata_schema()
{'required': ['Ophys'],
 'properties': {'Ophys': {'required': ['Device', 'ImageSegmentation'],
   'properties': {'Device': {'type': 'array',
     'minItems': 1,
     'items': {'required': ['name'],
      'properties': {'name': {'description': 'the name of this device',
        'type': 'string'},
       'description': {'description': 'Description of the device (e.g., model, firmware version, processing software version, etc.)',
        'type': 'string'},
       'manufacturer': {'description': 'the name of the manufacturer of this device',
        'type': 'string'}},
      'type': 'object',
      'additionalProperties': False,
      'tag': 'pynwb.device.Device'},
     'default': [{'name': 'Microscope'}]},
    'Fluorescence': {'required': [],
     'properties': {'roi_response_series': {'description': 'RoiResponseSeries to store in this interface',
       'type': 'array',
       'items': {'required': [],
        'properties': {'name': {'description': 'The name of this TimeSeries dataset',
          'type': 'string'},
         'unit': {'description': 'The base unit of measurement (should be SI unit)',
          'type': 'string'},
         'resolution': {'description': 'The smallest meaningful difference (in specified unit) between values in data',
          'type': 'number',
          'default': -1.0},
         'conversion': {'description': 'Scalar to multiply each element in data to convert it to the specified unit',
          'type': 'number',
          'default': 1.0},
         'starting_time': {'description': 'The timestamp of the first sample',
          'type': 'number'},
         'rate': {'description': 'Sampling rate in Hz', 'type': 'number'},
         'comments': {'description': 'Human-readable comments about this TimeSeries dataset',
          'type': 'string',
          'default': 'no comments'},
         'description': {'description': 'Description of this TimeSeries dataset',
          'type': 'string',
          'default': 'no description'},
         'control': {'description': 'Numerical labels that apply to each element in data',
          'type': 'array'},
         'control_description': {'description': 'Description of each control value',
          'type': 'array'},
         'offset': {'description': "Scalar to add to each element in the data scaled by 'conversion' to finish converting it to the specified unit.",
          'type': 'number',
          'default': 0.0}},
        'type': 'object',
        'additionalProperties': False,
        'tag': 'pynwb.ophys.RoiResponseSeries'},
       'minItems': 1,
       'default': [{'name': 'RoiResponseSeries',
         'description': 'Array of df/F traces.',
         'unit': 'n.a.'}]},
      'name': {'description': 'the name of this container',
       'type': 'string',
       'default': 'DfOverF'}},
     'type': 'object',
     'additionalProperties': False,
     'tag': 'pynwb.ophys.Fluorescence'},
    'ImageSegmentation': {'required': [],
     'properties': {'name': {'description': 'the name of this container',
       'type': 'string',
       'default': 'ImageSegmentation'}},
     'type': 'object',
     'additionalProperties': True,
     'tag': 'pynwb.ophys.ImageSegmentation'},
    'ImagingPlane': {'required': ['name',
      'optical_channel',
      'description',
      'device',
      'excitation_lambda',
      'indicator',
      'location'],
     'properties': {'name': {'description': 'the name of this container',
       'type': 'string'},
      'optical_channel': {'description': 'One of possibly many groups storing channel-specific data.',
       'type': 'array',
       'items': {'required': ['name', 'description', 'emission_lambda'],
        'properties': {'name': {'description': 'the name of this electrode',
          'type': 'string'},
         'description': {'description': 'Any notes or comments about the channel.',
          'type': 'string'},
         'emission_lambda': {'description': 'Emission wavelength for channel, in nm.',
          'type': 'number'}},
        'type': 'object',
        'additionalProperties': False,
        'tag': 'pynwb.ophys.OpticalChannel'},
       'minItems': 1,
       'maxItems': 1},
      'description': {'description': 'Description of this ImagingPlane.',
       'type': 'string'},
      'device': {'description': 'the device that was used to record',
       'type': 'string',
       'target': 'pynwb.device.Device'},
      'excitation_lambda': {'description': 'Excitation wavelength in nm.',
       'type': 'number'},
      'indicator': {'description': 'Calcium indicator', 'type': 'string'},
      'location': {'description': 'Location of image plane.',
       'type': 'string'},
      'imaging_rate': {'description': 'Rate images are acquired, in Hz. If the corresponding TimeSeries is present, the rate should be stored there instead.',
       'type': 'number'},
      'conversion': {'description': 'DEPRECATED: Multiplier to get from stored values to specified unit (e.g., 1e-3 for millimeters) Deprecated in favor of origin_coords and grid_spacing.',
       'type': 'number',
       'default': 1.0},
      'unit': {'description': 'DEPRECATED: Base unit that coordinates are stored in (e.g., Meters). Deprecated in favor of origin_coords_unit and grid_spacing_unit.',
       'type': 'string',
       'default': 'meters'},
      'reference_frame': {'description': 'Describes position and reference frame of manifold based on position of first element in manifold.',
       'type': 'string'},
      'origin_coords_unit': {'description': "Measurement units for origin_coords. The default value is 'meters'.",
       'type': 'string',
       'default': 'meters'},
      'grid_spacing_unit': {'description': "Measurement units for grid_spacing. The default value is 'meters'.",
       'type': 'string',
       'default': 'meters'}},
     'type': 'array',
     'additionalProperties': False,
     'tag': 'pynwb.ophys.ImagingPlane',
     'default': [{'name': 'ImagingPlane',
       'description': 'The plane or volume being imaged by the microscope.',
       'excitation_lambda': nan,
       'indicator': 'unknown',
       'location': 'unknown',
       'device': 'Microscope',
       'optical_channel': [{'name': 'OpticalChannel0',
         'emission_lambda': nan,
         'description': 'An optical channel of the microscope.'},
        {'name': 'OpticalChannel1',
         'emission_lambda': nan,
         'description': 'OpticalChannel1 description'}]}]},
    'TwoPhotonSeries': {'required': ['name', 'imaging_plane'],
     'properties': {'name': {'description': 'The name of this TimeSeries dataset',
       'type': 'string'},
      'imaging_plane': {'description': 'Imaging plane class/pointer.',
       'type': 'string',
       'target': 'pynwb.ophys.ImagingPlane'},
      'unit': {'description': 'The unit of measurement of the image data, e.g., values between 0 and 255. Required when data is specified. If unit (and data) are not specified, then unit will be set to "unknown".',
       'type': 'string'},
      'format': {'description': 'Format of image. Three types: 1) Image format; tiff, png, jpg, etc. 2) external 3) raw.',
       'type': 'string'},
      'field_of_view': {'description': 'Width, height and depth of image, or imaged area (meters).',
       'type': 'array'},
      'pmt_gain': {'description': 'Photomultiplier gain.', 'type': 'number'},
      'scan_line_rate': {'description': 'Lines imaged per second. This is also stored in /general/optophysiology but is kept here as it is useful information for analysis, and so good to be stored w/ the actual data.',
       'type': 'number'},
      'starting_frame': {'description': 'Each entry is a frame number that corresponds to the first frame of each file listed in external_file within the full ImageSeries.',
       'type': 'array'},
      'bits_per_pixel': {'description': 'DEPRECATED: Number of bits per image pixel',
       'type': 'number'},
      'dimension': {'description': 'Number of pixels on x, y, (and z) axes.',
       'type': 'array'},
      'resolution': {'description': 'The smallest meaningful difference (in specified unit) between values in data',
       'type': 'number',
       'default': -1.0},
      'conversion': {'description': 'Scalar to multiply each element in data to convert it to the specified unit',
       'type': 'number',
       'default': 1.0},
      'starting_time': {'description': 'The timestamp of the first sample',
       'type': 'number'},
      'rate': {'description': 'Sampling rate in Hz', 'type': 'number'},
      'comments': {'description': 'Human-readable comments about this TimeSeries dataset',
       'type': 'string',
       'default': 'no comments'},
      'description': {'description': 'Description of this TimeSeries dataset',
       'type': 'string',
       'default': 'no description'},
      'control': {'description': 'Numerical labels that apply to each element in data',
       'type': 'array'},
      'control_description': {'description': 'Description of each control value',
       'type': 'array'},
      'device': {'description': 'Device used to capture the images/video.',
       'type': 'string',
       'target': 'pynwb.device.Device'},
      'offset': {'description': "Scalar to add to each element in the data scaled by 'conversion' to finish converting it to the specified unit.",
       'type': 'number',
       'default': 0.0}},
     'type': 'array',
     'additionalProperties': False,
     'tag': 'pynwb.ophys.TwoPhotonSeries'},
    'DfOverF': {'required': [],
     'properties': {'roi_response_series': {'description': 'RoiResponseSeries to store in this interface',
       'type': 'array',
       'items': {'required': [],
        'properties': {'name': {'description': 'The name of this TimeSeries dataset',
          'type': 'string'},
         'unit': {'description': 'The base unit of measurement (should be SI unit)',
          'type': 'string'},
         'resolution': {'description': 'The smallest meaningful difference (in specified unit) between values in data',
          'type': 'number',
          'default': -1.0},
         'conversion': {'description': 'Scalar to multiply each element in data to convert it to the specified unit',
          'type': 'number',
          'default': 1.0},
         'starting_time': {'description': 'The timestamp of the first sample',
          'type': 'number'},
         'rate': {'description': 'Sampling rate in Hz', 'type': 'number'},
         'comments': {'description': 'Human-readable comments about this TimeSeries dataset',
          'type': 'string',
          'default': 'no comments'},
         'description': {'description': 'Description of this TimeSeries dataset',
          'type': 'string',
          'default': 'no description'},
         'control': {'description': 'Numerical labels that apply to each element in data',
          'type': 'array'},
         'control_description': {'description': 'Description of each control value',
          'type': 'array'},
         'offset': {'description': "Scalar to add to each element in the data scaled by 'conversion' to finish converting it to the specified unit.",
          'type': 'number',
          'default': 0.0}},
        'type': 'object',
        'additionalProperties': False,
        'tag': 'pynwb.ophys.RoiResponseSeries'},
       'minItems': 1,
       'default': [{'name': 'RoiResponseSeries',
         'description': 'Array of df/F traces.',
         'unit': 'n.a.'}]},
      'name': {'description': 'the name of this container',
       'type': 'string',
       'default': 'DfOverF'}},
     'type': 'object',
     'additionalProperties': False,
     'tag': 'pynwb.ophys.Fluorescence'}},
   'type': 'object',
   'additionalProperties': False}},
 'type': 'object',
 'additionalProperties': False,
 '$schema': 'http://json-schema.org/draft-07/schema#',
 '$id': 'metadata.schema.json',
 'title': 'Metadata',
 'description': 'Schema for the metadata',
 'version': '0.1.0'}
jeromelecoq commented 1 year ago

Very good example. In some cases, brain areas might be undefined at the time of the experiment and could be worked out much later on.

Suite2p has no way to know them from Tiff files or else. All of these are only known to the experimentalist. So either suite2p has a built-in GUI to fill those in or they can be edited after the fact. This has to be a common problem.