[Feature]: Input of .avi for Optical Physiology archiving in NWB output

ucscbrianlee commented 1 month ago

What would you like to see added to NeuroConv?

Preserve raw input files for eventual NWB file, intermediate conversions of .avi are available, so this feature request is not a roadblock to users (i.e., users can navigate around it). Ex: Lab produces samples like SAMPLE1234-gcamp-w1-1r.avi -this will go into ImageJ for optical physiology analysis. In current model, I believe the .avi is converted within ImageJ, would including the original .avi be of use for future data analysts wishing to start from the raw input (not 100% sure; and perhaps this request may be related to #915?).

Is your feature request related to a problem?

Less a problem, more a conceptual inclusion of raw inputs.

Do you have any interest in helping implement the feature?

No.

Code of Conduct

[X] I agree to follow this project's Code of Conduct
[X] Have you ensured this bug was not already reported?

CodyCBakerPhD commented 1 month ago

Can you give a full concrete example of what you're requesting? I kind of get a sense from your current example but I want to be completely sure before giving the current stance of NWB/DANDI/NeuroConv

ucscbrianlee commented 1 month ago

Sure, this for a lab we are working with, so I may not have the full processes clearly defined. Here's an overview of how I believe their work flow goes:

Camera + Microscope
      |
      v
Captured AVI File
      |
      v
ImageJ
      |
      v
Open AVI File in ImageJ converting to TIFF (not 100%) 
      |
      v
Motion Correction software
      |
      v
Analysis of Image software
      |
      v
ROI Detection + Signal Extraction + Deconvolution
      |
      v
Export to CSV          Export MatLab work to NWB
      |                     |
      v                     v
Final CSV File        Final NWB File (can include conversion/intermediate steps)

I think the crux of the topic for the data coordination center is if the original .avi file should be kept anywhere, it seems probably not, since the input will be converted at the ImageJ step and the lab can include it then in the final NWB output. The question (and I am not sure of its value) is whether the original .avi would be useful somehow in the future.

CodyCBakerPhD commented 1 month ago

Thanks for some clarification

We would treat that .avi file as 'raw acquisition' and so yes, could be broadly useful to the community to share via DANDI (which hosts the data for free), especially the tool developers who are always looking for more raw data to train and benchmark their methods on

Refer to the best practice for ophys data: https://nwbinspector.readthedocs.io/en/dev/best_practices/image_series.html#use-internal-dataset-for-videos-of-neurophysiological-data

That is, write it internally, not store as an external .avi file. NeuroConv offers a current interface for at leas the TIFF export: https://neuroconv.readthedocs.io/en/main/conversion_examples_gallery/imaging/tiff.html

But you could request a 'AviImagingInterface` if you would rather read directly from source

Writing the data internally has a number of advantages, first and foremost being better compression (.avi can be compressed but isn't always), which can oftentimes reach levels of up to 2.5:1 (that's close to 1/3 of the original file sizes) - needless to say, this is a prerequisite for storing large data on the archive

One disadvantage of writing the data internally though, as you basically point out, is that most of those software used to ingest the .avi would not be able to ingest the .nwb file and so full reproducibility of the results as the original experimenters obtained them would be hampered. We constantly work with tool developers to add NWB read support and if you want to make a specific request for your labs current software, please open a discussion on the NWB helpdesk about that

CodyCBakerPhD commented 1 month ago

That said, the official NIH policy is only to share data that is 'relevant to confirming the scientific hypothesis' so it is sometimes a case-by-case basis for whether or not sharing the raw data or just the final purely processed output is most relevant. I suggest working with your program officer to determine that specifically for any of your labs current projects, if that is the main reason for converting to NWB and uploading to DANDI

ucscbrianlee commented 1 month ago

Thank you for your input and responses.

catalystneuro / neuroconv