Closed tuanpham96 closed 1 year ago
@tuanpham96 Thanks for all these great ideas!
It would also be useful to have a supported features/use-cases and limitations section.
Agreed, a table like that would be fantastic. We'll see what we can put together in a sphinx table for the docs. I've meant to have one of these for tracking format support as well (I actually have one but it's a big messy google spreadsheet that only gets updated once a year, lol...)
On the input side, it might be useful to include what the input files/folder organization(s) is/are expected
For the case of file_path
this should be clear from the docstrings for each interface (I hope...)
For the case of folder structures, yeah we should absolutely provide crystal clear documentation of what we mean by that (it always means 'the source folder structure as the software or acquisition system generated it' such as SpikeGLX, Suite2P, etc...) but nonetheless it wouldn't hurt to be concise and have a section that discourages users from manually altering that structure
On the output side, include what the outputs would look like, and potentially also the mapping between organization between what's inside the inputs and what's inside the NWB output
We've attempted this before, see the NeuroConv showcase as a DANDI (staging) gallery
So why is that showcase undocumented? Because it's very much out of date. I intended to make a GitHub workflow to automatically update it, say on a daily or weekly basis, but haven't been able to get around to it since the first manual push
Explicitly writing out in the docs all the input-output mappings for each format as you describe would be very hard to maintain I think; if we have any free time it would definetely be good. At least insofar as mapping of SI/ROIExtractor objects are concerned. Maybe we could do it for the most popular formats and leave others as 'on request'
big messy google spreadsheet
Awesome! I think we can start with that. Do you imagine we have one table for everything or one table for each page?
An alternative for the sphinx table is just having one big AirTable or BaseRow table (or one for each category) that may offer more features if need be. But might be an overkill and possibly not version controlled.
For the case of file_path this should be clear from the docstrings for each interface (I hope...)
I think in most cases it's clear. An example that might not be clear, at least for me, is the HDF5 for imaging page.
While the other options on the subsection of Optical Physiology > Imaging
seem intuitive as they are specific and be searchable, HDF5 is a popular format and I'm not entirely sure which softwares or hardwares support it. If there are a few providers/packages that do/support this, one suggestion could be provide a link to 1 example.
Regarding folder structure, showing the folder structure may also describe what features are supported. For example with suite2p
.
├── plane0
│ ├── F.npy
│ ├── F_chan2.npy # [optional] support for 2 channels
│ ├── Fneu.npy
│ ├── Fneu_chan2.npy # [optional] support for 2 channels
│ ├── iscell.npy
│ ├── ops.npy
│ ├── spks.npy
│ └── stat.npy
└── plane1 # support for multi-plane
├── F.npy
├── F_chan2.npy # [optional] support for 2 channels
├── Fneu.npy
├── Fneu_chan2.npy # [optional] support for 2 channels
├── iscell.npy
├── ops.npy
├── spks.npy
└── stat.npy
Explicitly writing out in the docs all the input-output mappings for each format as you describe would be very hard to maintain
Maybe we could do it for the most popular formats and leave others as 'on request'
I do agree it's not very easy to maintain all of this. Hence maybe show only the popular ones for the time being like you suggested. I like the "on request" idea.
Plus, even for the popular ones, I think just listing out the frequently used objects are sufficient. For example with suite2p
, the objects most would care about are Fluorescence
, DfOverf
, Neuropil
, Deconvolved
needed for data analysis.
input-output mappings
One other idea inspired by the gallery + workflow that you mentioned, what about during testing for each format in CI, you also capture the resulting tree structures of the NWB file? I'm not entirely sure how to do that but I guess something like h5py.visit[items]
but then output into a tree. Then in the related documentation page, display that in either text or screen capture.
What if we just include the html output of the NWBFile that is created as a result of the test conversions?
@tuanpham96 BTW, the format spreadsheet has slowly been forming in a much more structured way (formatting WIP) over on the NWB GUIDE docs: https://nwb-guide.readthedocs.io/en/latest/format_support.html
Tracking of missing modalities are in progress, but the sheet on that page will update whenever changes are made so just check back periodically for the latest version
As far as the input-output mappings request goes, it's unlikely we will ever see a true central collection of these details in the documentation simply because so many formats have so many versions and these I/O mappings can and do change over each of those versions, often at the behest of upstream libraries (not just the ones we support), and are often changing even within our own libraries over time; for the best most up-to-date information regarding such details, I always recommend checking out the API docstrings for a given interface or extractor and we will do our best to keep those as detailed, accurate, and updated as possible. @pauladkisson in particular has been doing a great job updating the ROIExtractors docstrings with details such as these
html output of the NWBFile that is created as a result of the test conversions?
@bendichter that's an interesting suggestion, how is that done?
https://nwb-guide.readthedocs.io/en/latest/format_support.html
@CodyCBakerPhD Awesome, that looks great! And I agree with your points about input-output mapping. Seems like a complicated task and hard to sustain. I'm ok with closing this for now if you are, since I'm not sure what the best way forward with this issue. Plus you already have the format support page up.
Thanks, I'll try to remember to let you know when the official NeuroConv/GUIDE showcase of file outputs is up on DANDI staging (in an automated way, not the outdated stuff on it now lol)
What would you like changed or added to the documentation and why?
First of, it is amazing how many use cases are covered by
neuroconv
and the number & coverage keeps growing!I think that documentation could be improved to help users and developers who convert data using
neuroconv
, as well as using the outputs fromneuroconv
.For each use case, it would be nice to know (1) what the inputs "look" like, (2) what the outputs "look" like and (3) some limitations and/or supported of the converters, especially if the way
neuroconv
is organizing the data inside NWB file diverges from the original tools' converters (e.g. howneuroconv
might differ fromsuite2p.io.nwb
).For example, taking the case of optical physiology with suite2p.
Fluorescence/DfOverF
and the ROI table, and thatFluorescence
comes from theF.npy
files, and ROI table is fromstat.npy
.suite2p
output for a long time and start to switch to using NWB.neuroconv
has taken into account multi-channel and multi-plane recordings fromsuite2p
outputs, and what is exported out ofsuite2p
where possible.neuroconv
yet, or should raise an issue/feature request.Do you have any interest in helping write or edit the documentation?
Yes, but I would need guidance.
Code of Conduct