Open satra opened 4 years ago
@rly and @bendichter - is there a place one can start now? @nicolocin can help start a google doc, but will need some help. so if either of you can even paste links over here we can take it from there.
cc: @t-b
@satra How about starting the list in this ticket?
FYI this page: https://github.com/OpenSourceBrain/NWBShowcase/blob/master/README.md has pointers to example scripts (all Python so far) for converting specific datasets to NWB. While it does so far cover ABF/Igor Pro/MATLAB mat to NWB, it doesn't have generic converters of all files in format X to NWB; datasets are usually different enough that you'd have to look inside to appreciate how the data is laid out and best represent it in NWB.
@pgleeson - thanks for that pointer.
it is likely that people will have to do multiple things. but i think we want to get to as much automation as possible for fileformat based conversion. and then provide additional tools to add other components that may not come from the instrument exported file. here i wanted to get to the first starting point, given an exported file from an instrument, how does the user go about step 1 convert just that to nwb (and can that be done via automation). @bendichter is building a set of custom tools and we are trying to figure out ways of abstracting the process to more generic steps as much as possible.
Right now the best thing we have is the tutorials in pynwb and matnwb. We have example notebooks (and the MATLAB equivalent) for creating basic files in ecephys, ophys, icephys, and behavioral data. There are videos on our Youtube channel for each modality and programming language that walks you through each of the tutorials, explaining the logic of the code and how you might extend it. Users who watch these videos generally find them very helpful.
Ideally, it should be possible to convert data to NWB without having to learn the low level APIs. There are some really strong conversion software packages coming together (SpikeInterface, CaImAn, suite2p, calciumImagingAnalysis, SegmentationInterface), but data conversion inevitably requires combining proprietary data in multiple formats and/or adding metadata that are missing in those files. Currently, this requires manual coding, so even if you use one of these packages, sooner or later you'll have to use pynwb or matnwb to write in that missing data, particularly for behavioral data, which everyone stores differently. We are working on solving this problem and creating a uniform interface for combining data from multiple modalities and metadata in NWB Conversion Tools. This is a tricky problem though, and we've been wrestling with how to handle a variety of input formats and metadata requirements. We have a structure now that we are confident in, but it will take some time before this is ready for prime time. In the meantime, the list of NWB-enabled tools is a good start.
@bendichter - on the list of NWB-enabled tools only the spikeinterface one potentially has value for conversion, the rest are all analysis tools, which means you have to get to NWB first. many experimental neuroscience labs don't have people technical in python or nwb to convert themselves, and i think this is a significant bottleneck. so even having converters that go from hardware output files to partial NWB would be a really good first step, which can then be augmented through additional data.
i've updated the original post with a link to different converters. i'm assuming each of these libraries have test datasets. if so i would like to start creating a neurophys version of heudiconv. this will be command line based to start with, but the inputs can be simple text files for additional information. i've started a repo here: https://github.com/dandi/neuroconv and will use that to build up the automation process.
we will dig into raw datasets as available, but if any of you have pointers to existing raw datasets that would be great. please post here: https://github.com/dandi/neuroconv/issues/1
@satra
ecephys: SpikeExtractors converts from 19+ different acquisition and 14+ processed ecephys proprietary formats to NWB.
ophys: CaImAn, suite2p, and calciumImagingAnalysis can all output processed ophys directly into NWB format. We are working on SegmentationExtractors, which will be analogous to SpikeExtractors, enabling conversion from a number of different raw and processing ophys data formats to NWB, but it's not quite ready for external use.
icephys: This is the most recent development and least supported atm. Currently the best system for conversion is the x_to_nwb
module of ipfx. It does support the latest NWB 2.0, but it was created for the Allen Institute and isn't quite full-featured enough to work in the general case.
behavior: This is particularly tough because this sort of processing is usually totally ad hoc per scientist. Would love to provide support for DeepLabCut and any other emerging community tools for behavioral preprocessing.
Would a page with that sort of information (with less editorialization) be useful?
These components are important, but in general converting session data to NWB is more complicated than just changing a file format. You need to combine across multiple modalities which are often collected on entirely different systems, and you need to add missing metadata. There are some systems, like Bonsai, which have a standardized way of synchronizing between different modalities. We think that's awesome, and @alejoe91 and I are working with Nicol to build automated conversion from there, but unfortunately Bonsai usage currently represents a pretty small percentage of acquired data in neurophsyiology, and the rest have ad hoc synchronization techniques that need to be dealt with manually. For these, we are workin on NWB Conversion Tools, which has a command-line interface. See e.g. the Jaeger Lab repo for example usage, but this as well is not quite ready for external use.
heudiconv looks interesting. Thanks for the pointer.
I'll work on a table like you requested
Going through this, it feels like these tables will be a bit awkward, due the how many different formats are handled by the same tool. I take your point though that this is a major point of confusion for new users and needs to be better documented.
hardware | format | converter | how to link |
---|---|---|---|
pClamp 6-9 | ABF | ipfx | https://github.com/AllenInstitute/ipfx/blob/master/ipfx/x_to_nwb/Readme.md |
pClamp 10+ | ABF2 | ipfx | https://github.com/AllenInstitute/ipfx/blob/master/ipfx/x_to_nwb/Readme.md |
DAT | ipfx | https://github.com/AllenInstitute/ipfx/blob/master/ipfx/x_to_nwb/Readme.md |
We now have this page up on our website: https://www.nwb.org/conversion-to-nwb/
I fully appreciate that more work is needed to make conversion more convenient for scientists who are not coders.
@bendichter thanks for adding the page. A couple of suggestions. I think we should divide the page into 3 sections: 1) Custom, 2) Common Raw & Processed Data, and 3) Details. In the "Custom Formats" section we would have a paragraph were we would point to PyNWB and MatNWB to convert data to NWB and point to the NDX Catalog for details on extensions. In the "Common Raw & Processed Data" section would be the heading for the content you already added. And finally, in the "Details" section we could add the tables you described in you listed in your comment above. We should then also add a link to the page probably in the "Community" menu.
@bendichter - thanks for starting to put this together. it's ok if the same extractor covers multiple devices. the key entry point is what hardware someone is using and how to start from there.
@oruebel - i would change the order to go from easy to hard. so common then custom. but for common, the intent was to say these instruments have some loader/converter/analyzer out there, with support for NWB. processed data is more about analytics and i think it should have its own section for now.
i would change the order to go from easy to hard. so common then custom
My main thinking for having the "custom" section first was that this section will be fairly brief and mainly point to the standard APIs and tutorials, whereas the other sections will be more detailed. Having the custom section first should not distract much, while having it second means that its more easy to get lost.
having the "custom" section first was that this section will be fairly brief and mainly point to the standard APIs and tutorials
if that's the case, then absolutely. i was hoping that the custom section could get more detailed too :)
lowering the entry point to NWB. one of the current strong limitations is the conversion to NWB without having someone technical in the group. given that there exists converters that simplify some of this for specific hardware it would be good to create a page that tells a user where to go in order to perform the conversion. i'm moving a discussion from the nwb slack channel to here so that it can be tracked.
this question was specifically about supported hardware. it came up in the SPARC context. a given software can have multiple roles, so spikeinterface could be listed somewhere else.
but the main thing is to create a page that provides something like this, perhaps grouped by modality:
each of the items would be links to the relevant descriptor for hardware, format, converter library, example.
this could be a wiki page or something that's integrated into the docs themselves.
current readers/converters: https://github.com/dandi/neuroconv/issues/2