[Documentation]: Add input/output expectations & feature/limitations for each use case

tuanpham96 commented 1 year ago

What would you like changed or added to the documentation and why?

First of, it is amazing how many use cases are covered by neuroconv and the number & coverage keeps growing!

I think that documentation could be improved to help users and developers who convert data using neuroconv, as well as using the outputs from neuroconv.

For each use case, it would be nice to know (1) what the inputs "look" like, (2) what the outputs "look" like and (3) some limitations and/or supported of the converters, especially if the way neuroconv is organizing the data inside NWB file diverges from the original tools' converters (e.g. how neuroconv might differ from suite2p.io.nwb).

For example, taking the case of optical physiology with suite2p.

On the input side, it might be useful to include what the input files/folder organization(s) is/are expected.
- If it's just what suite2p output, then just link to that page.
- If it is hardware/software-specific, it should include information/links to the expected input/sources.
On the output side, include what the outputs would look like, and potentially also the mapping between organization between what's inside the inputs and what's inside the NWB output.
- It doesn't have to be very detailed, only the data that users tend to need from the original files.
- For example, how users can get to Fluorescence/DfOverF and the ROI table, and that Fluorescence comes from the F.npy files, and ROI table is from stat.npy.
- This is especially useful for users who have worked with the suite2p output for a long time and start to switch to using NWB.
- It would also be useful to have a supported features/use-cases and limitations section.
For example, whether currently neuroconv has taken into account multi-channel and multi-plane recordings from suite2p outputs, and what is exported out of suite2p where possible.
This is also useful for users to evaluate whether they can use neuroconv yet, or should raise an issue/feature request.

Do you have any interest in helping write or edit the documentation?

Yes, but I would need guidance.

Code of Conduct

[X] I agree to follow this project's Code of Conduct
[X] Have you ensured this bug was not already reported?

CodyCBakerPhD commented 1 year ago

@tuanpham96 Thanks for all these great ideas!

It would also be useful to have a supported features/use-cases and limitations section.

Agreed, a table like that would be fantastic. We'll see what we can put together in a sphinx table for the docs. I've meant to have one of these for tracking format support as well (I actually have one but it's a big messy google spreadsheet that only gets updated once a year, lol...)

On the input side, it might be useful to include what the input files/folder organization(s) is/are expected

For the case of file_path this should be clear from the docstrings for each interface (I hope...)

For the case of folder structures, yeah we should absolutely provide crystal clear documentation of what we mean by that (it always means 'the source folder structure as the software or acquisition system generated it' such as SpikeGLX, Suite2P, etc...) but nonetheless it wouldn't hurt to be concise and have a section that discourages users from manually altering that structure

On the output side, include what the outputs would look like, and potentially also the mapping between organization between what's inside the inputs and what's inside the NWB output

We've attempted this before, see the NeuroConv showcase as a DANDI (staging) gallery

So why is that showcase undocumented? Because it's very much out of date. I intended to make a GitHub workflow to automatically update it, say on a daily or weekly basis, but haven't been able to get around to it since the first manual push

Explicitly writing out in the docs all the input-output mappings for each format as you describe would be very hard to maintain I think; if we have any free time it would definetely be good. At least insofar as mapping of SI/ROIExtractor objects are concerned. Maybe we could do it for the most popular formats and leave others as 'on request'

tuanpham96 commented 1 year ago

big messy google spreadsheet

Awesome! I think we can start with that. Do you imagine we have one table for everything or one table for each page?

An alternative for the sphinx table is just having one big AirTable or BaseRow table (or one for each category) that may offer more features if need be. But might be an overkill and possibly not version controlled.

For the case of file_path this should be clear from the docstrings for each interface (I hope...)

I think in most cases it's clear. An example that might not be clear, at least for me, is the HDF5 for imaging page.

While the other options on the subsection of Optical Physiology > Imaging seem intuitive as they are specific and be searchable, HDF5 is a popular format and I'm not entirely sure which softwares or hardwares support it. If there are a few providers/packages that do/support this, one suggestion could be provide a link to 1 example.

Regarding folder structure, showing the folder structure may also describe what features are supported. For example with suite2p

.
├── plane0
│   ├── F.npy
│   ├── F_chan2.npy        # [optional] support for 2 channels
│   ├── Fneu.npy
│   ├── Fneu_chan2.npy     # [optional] support for 2 channels
│   ├── iscell.npy
│   ├── ops.npy
│   ├── spks.npy
│   └── stat.npy
└── plane1                 # support for multi-plane
    ├── F.npy
    ├── F_chan2.npy        # [optional] support for 2 channels
    ├── Fneu.npy
    ├── Fneu_chan2.npy     # [optional] support for 2 channels
    ├── iscell.npy
    ├── ops.npy
    ├── spks.npy
    └── stat.npy

Explicitly writing out in the docs all the input-output mappings for each format as you describe would be very hard to maintain

Maybe we could do it for the most popular formats and leave others as 'on request'

I do agree it's not very easy to maintain all of this. Hence maybe show only the popular ones for the time being like you suggested. I like the "on request" idea.

Plus, even for the popular ones, I think just listing out the frequently used objects are sufficient. For example with suite2p, the objects most would care about are Fluorescence, DfOverf, Neuropil, Deconvolved needed for data analysis.

input-output mappings

One other idea inspired by the gallery + workflow that you mentioned, what about during testing for each format in CI, you also capture the resulting tree structures of the NWB file? I'm not entirely sure how to do that but I guess something like h5py.visit[items] but then output into a tree. Then in the related documentation page, display that in either text or screen capture.

bendichter commented 1 year ago

What if we just include the html output of the NWBFile that is created as a result of the test conversions?

CodyCBakerPhD commented 1 year ago

@tuanpham96 BTW, the format spreadsheet has slowly been forming in a much more structured way (formatting WIP) over on the NWB GUIDE docs: https://nwb-guide.readthedocs.io/en/latest/format_support.html

Tracking of missing modalities are in progress, but the sheet on that page will update whenever changes are made so just check back periodically for the latest version

As far as the input-output mappings request goes, it's unlikely we will ever see a true central collection of these details in the documentation simply because so many formats have so many versions and these I/O mappings can and do change over each of those versions, often at the behest of upstream libraries (not just the ones we support), and are often changing even within our own libraries over time; for the best most up-to-date information regarding such details, I always recommend checking out the API docstrings for a given interface or extractor and we will do our best to keep those as detailed, accurate, and updated as possible. @pauladkisson in particular has been doing a great job updating the ROIExtractors docstrings with details such as these

tuanpham96 commented 1 year ago

html output of the NWBFile that is created as a result of the test conversions?

@bendichter that's an interesting suggestion, how is that done?

https://nwb-guide.readthedocs.io/en/latest/format_support.html

@CodyCBakerPhD Awesome, that looks great! And I agree with your points about input-output mapping. Seems like a complicated task and hard to sustain. I'm ok with closing this for now if you are, since I'm not sure what the best way forward with this issue. Plus you already have the format support page up.

CodyCBakerPhD commented 1 year ago

Thanks, I'll try to remember to let you know when the official NeuroConv/GUIDE showcase of file outputs is up on DANDI staging (in an automated way, not the outdated stuff on it now lol)

catalystneuro / neuroconv