catalystneuro / nwb-conversion-tools

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://nwb-conversion-tools.readthedocs.io/en/main/
BSD 3-Clause "New" or "Revised" License
25 stars 12 forks source link

DataInterface nesting structure #153

Closed CodyCBakerPhD closed 3 years ago

CodyCBakerPhD commented 3 years ago

Pertaining to PR #149, I'd like to open a discussion between maintainers @luiztauffer @alejoe91 @bendichter for everyone's input on evaluating the nesting structure for the data interfaces. @sneakers-the-rat has proposed shifting from a datainterfaces/formatname.py approach to interfaces/extractortype/formatname.py. The datainterfaces/formatname.py approach was inherited originally from the spikeextractors nesting format, which was adopted by roiextractors, and then by us to keep things consistent.

Ultimately, we want to (a) minimize the code necessary to run the repo functional for easier maintainability but hopefully not at the cost of (b) being too hard for users to explore. While (b) could certainly be combated independently through enhanced documentation, it may be worth asking if this proposed methodology might help as well. One downside to the extractortype nesting is it duplicates the overall number of files for any format that has multiple extractor types (in cases like Neuroscope, this could generate 3-4x as many files, and the extra lines of import code required on the top).

There could also be a nesting structure of interfaces/datatype/formatname.py where datatype in ["ecephys", "ophys"] that could be worth throwing in the mix for discussion. Going this route could solve the issue of code file amplification (since no formats are both types, I think) and could also make managing imports from each package more convenient/localized as well. (ecephys imports drawing from only spikeextractors in the init, etc).

Anyway, just wanted to see what everyone's thoughts are.

sneakers-the-rat commented 3 years ago

Yes, happy to discuss further :)

Here is the structure of the library in the current master branch:

./nwb_conversion_tools/
├── __init__.py
├── auto_qc.py
├── basedatainterface.py
├── baseimagingextractorinterface.py
├── baselfpextractorinterface.py
├── baserecordingextractorinterface.py
├── basesegmentationextractorinterface.py
├── basesortingextractorinterface.py
├── conversion_tools.py
├── datainterfaces
│   ├── __init__.py
│   ├── blackrockdatainterface.py
│   ├── ceddatainterface.py
│   ├── cellexplorerdatainterface.py
│   ├── imagingextractorinterface.py
│   ├── intandatainterface.py
│   ├── interface_utils
│   │   ├── __init__.py
│   │   └── brpylib.py
│   ├── neuroscopedatainterface.py
│   ├── openephysdatainterface.py
│   ├── phydatainterface.py
│   ├── roiextractordatainterface.py
│   ├── sipickledatainterfaces.py
│   └── spikeglxdatainterface.py
├── json_schema_utils.py
├── metafile.schema.json
├── nwbconverter.py
├── schemas
│   ├── source_example.json
│   └── source_schema.json
└── utils.py

and this is how i've refactored it

./nwb_conversion_tools
├── interfaces
│   ├── imaging
│   │   ├── __init__.py
│   │   ├── base_imaging.py
│   │   └── imaging.py
│   ├── interface_utils
│   │   ├── __init__.py
│   │   └── brpylib.py
│   ├── recording
│   │   ├── lfp
│   │   │   ├── __init__.py
│   │   │   ├── base_lfp.py
│   │   │   ├── neuroscope.py
│   │   │   └── spike_glx.py
│   │   ├── __init__.py
│   │   ├── base_recording.py
│   │   ├── blackrock.py
│   │   ├── ced.py
│   │   ├── intan.py
│   │   ├── neuroscope.py
│   │   ├── open_ephys.py
│   │   ├── spike_glx.py
│   │   └── spike_interface.py
│   ├── segmentation
│   │   ├── __init__.py
│   │   ├── base_segmentation.py
│   │   └── roi.py
│   ├── sorting
│   │   ├── __init__.py
│   │   ├── base_sorting.py
│   │   ├── blackrock.py
│   │   ├── cell_explorer.py
│   │   ├── neuroscope.py
│   │   ├── open_ephys.py
│   │   ├── phy.py
│   │   └── spike_interface.py
│   ├── __init__.py
│   └── base_data.py
├── schemas
│   ├── source_example.json
│   └── source_schema.json
├── spec
│   └── __init__.py
├── __init__.py
├── auto_qc.py
├── conversion_tools.py
├── json_schema_utils.py
├── metafile.schema.json
├── nwbconverter.py
└── utils.py

As is, the hierarchy is essentially flat with the base classes in the root directory and the interfaces in datainterfaces. There are a few downsides to that:

Screen Shot 2021-03-24 at 5 14 01 PM

Where interfaces are neatly sorted by type, which is implemented in rst as:

Recording
==========

.. automodule:: nwb_conversion_tools.interfaces.recording

Base Recording
---------------

.. automodule:: nwb_conversion_tools.interfaces.recording.base_recording

Blackrock
---------------

.. automodule:: nwb_conversion_tools.interfaces.recording.blackrock

CED
---------------

.. automodule:: nwb_conversion_tools.interfaces.recording.ced

Intan
---------------

.. automodule:: nwb_conversion_tools.interfaces.recording.intan

Neuroscope
---------------

.. automodule:: nwb_conversion_tools.interfaces.recording.neuroscope

Open Ephys
---------------

.. automodule:: nwb_conversion_tools.interfaces.recording.open_ephys

Spike GLX
---------------

.. automodule:: nwb_conversion_tools.interfaces.recording.spike_glx

Spike Interface
---------------

.. automodule:: nwb_conversion_tools.interfaces.recording.spike_interface

Doing API docs for the current structure would result in the top-level tree under "interfaces" as a big flat long list, or else would require a lot of manual curation at the level of individual classes, rather than modules, as is done here in the existing documentation (whose end structure mirrors the structure proposed here of recording interfaces grouped together separately from sorting interfaces underneath ephys and ophys): https://github.com/catalystneuro/nwb-conversion-tools/blob/master/docs/converting_data_to_nwb.md

I like the interfaces/ecephys/... and interfaces/ophys/... structure, as that would mirror the nwb spec more closely, and think that would also be a good way to refactor the extractor interfaces: perhaps the recording extractor class has subclasses ecephys etc. My proposal ultimately is not "lets do this exact format," but that a path structure that mirrors the logical structure of the code would be good, so anything under that heading is good with me :)

So anyway, ty for entertaining this PR, don't mean to ride in and be very bossy, just a proposal. I come in peace and goodwill and intention to work!

bendichter commented 3 years ago

Thanks for breaking this down, @sneakers-the-rat! It's worth thinking about restructuring, especially if it will help with documentation and accessibility. Our current structure is similar to SpikeExtractors, though I agree that (1) their structure may not be ideal in the first place and (2) we may need a bit of a different structure because our scope is broader. I can see how having the folder structure reflect the class structure would be helpful for a developer.

Let's think about the pros and cons of our available options. I think separating first by modality makes the most sense, especially if we are going to use this for generating documentation, because that would be a very natural way for a user to navigate the available tools. I would wonder if organizing by acquisition system might have certain advantages. (1) Would users interested in a SortingInterface for a certain acquisition system also be interested in the RecordingInterface for the same system? If so, it might make sense to have them together in the docs (even though their class hierarchy is different) (2) Are there any Recording and Sorting Interfaces for the same acquisition system that would share code, for instance to retrieve metadata? If so, it might be better to have them in the same folder or file.

All of that being said, I think the structure you propose may be worthwhile. If we do move things around, most important will be to be very careful not to break existing imports for downstream repos, and to smoothly deprecate if any imports change.

luiztauffer commented 3 years ago

@sneakers-the-rat I’m impressed by the scope of this first contribution, thanks for that! Overall I agree with those points, reproducing the inheritance structure in the directory organization would be advantageous to everyone, new and old developers.

I like Cody suggestion for interfaces/datatype/formatname.py where datatype in ["ecephys", “icephys”, "ophys", “behavior”, “misc”]. In fact, that’s how nwb-conversion-tools used to be before a big refactor last year. This options aligns well with pynwb api organization.

I don’t see much advantage in separating recording from sorting though, since most interfaces for electrophys formats will likely have both, and as @CodyCBakerPhD pointed out will multiply files and scatter attention of what imo belong together.

Those changes are likely to break all current lab-specific code, but I feel this would be quick to fix, since it’s simply changing the import headers (we can also fix nwb-conversion-tools==0.7.2). I suggest we go adding all proposed changes in a separated branch (not on master) before we have time to carefully test for all current lab projects.

sneakers-the-rat commented 3 years ago

fabulous! Y'all have better perspective on recording vs sorting than I do, I was just following the inheritance, so yes by all means combine them!

feel free to edit the PR, pull to a different branch, etc. :)

CodyCBakerPhD commented 3 years ago

As @luiztauffer indicated, as of early November last year we used a ./ecephys and ./ophys nesting structure, but apparently it wasn't within a datainterfaces/. umbrella, they were just out in the open. We also only supported a few formats at that time as it was a very early prototyping stage.

Since the consensus seems to be to rearrange the interfaces according to a datatype in ["ecephys", “icephys”, "ophys", “behavior”, “misc”] structure, I'll make a branch with just that particular change in place.

CodyCBakerPhD commented 3 years ago

Moving forward on this!