[Feature]: Import mean or summary images from all segmentation methods

RichieHakim commented 1 year ago

What would you like to see added to ROI Extractors?

This is related to https://github.com/catalystneuro/roiextractors/issues/224#issuecomment-1574540889 but deserves its own post.

Some of the segmentation classes do not import a standard field of view image from the ingested data. This is a key piece of data for many workflows. In particular, caiman is missing everything, and cnmfe is missing a summary/mean image.

Code to check this:

import roiextractors
from pathlib import Path

dir_segData = '/media/rich/Home_Linux_partition/github_repos/ophys_testing_data/segmentation_datasets'

methods = [
    'caiman',
    'cnfme',
    'extract',
    'nwb',
    'suite2p',
]

r_dict = {
    'caiman': (
        roiextractors.CaimanSegmentationExtractor,
        (
            str(Path(dir_segData) / 'caiman' / 'caiman_analysis.hdf5'),
        ),
    ),
    'cnfme': (
        roiextractors.CnmfeSegmentationExtractor,
        (
            str(Path(dir_segData) / 'cnmfe' / '2014_04_01_p203_m19_check01_cnmfeAnalysis.mat'),
        )
    ),
    'extract': (
        roiextractors.ExtractSegmentationExtractor,
        (
            str(Path(dir_segData) / 'extract' / 'extract_public_output.mat'),
            30,
        ),
    ),
    'nwb': (
        roiextractors.NwbSegmentationExtractor,
        (
            str(Path(dir_segData) / 'nwb' / 'nwb_test.nwb'),
        ),
    ),
    'suite2p': (
        roiextractors.Suite2pSegmentationExtractor,
        (
            str(Path(dir_segData) / 'suite2p'),
        ),
    ),
}

for key_class, (r_class, r_args) in r_dict.items():
    try:
        test2 = r_class(*r_args)
        print(f'{key_class}: {[key_im for key_im, im in test2.get_images_dict().items() if im is not None]}')
        # for key_im, im in test2.get_images_dict().items():
        #     if im is not None:
        #         print(key_im)
        #         plt.figure()
        #         plt.imshow(im)
    except Exception as e:
        print(f'{key_class}: FAILED: {e}')

Returns output:

caiman: []
cnfme: ['correlation']
extract: ['summary_image', 'f_per_pixel', 'max_image']
nwb: ['mean', 'correlation']
suite2p: ['mean', 'correlation']

Do you have any interest in helping implement the feature?

Yes, but I would need guidance.

Code of Conduct

[X] I agree to follow this project's Code of Conduct
[X] Have you ensured this bug was not already reported?

CodyCBakerPhD commented 1 year ago

Hi @RichieHakim,

I'm actually pleasantly surprised so many of the example data formats do return something for that field - in practice whenever we've worked with data of these formats from individual labs they have only occasionally had those fields present, which is why ROIExtractors considers them optional fields (with a fallback of None if not found, which appears to be the case here since there is actually some logic in CaimanSegmentationExtractor for reading it)

I can specifically comment on this aspect for the NWBSegmentationExtractor, which does not require such summary images to be present in the file

However, as I look through the codebase, it seems that it should not raise an error but rather return a dictionary with None values: https://github.com/catalystneuro/roiextractors/blob/main/src/roiextractors/segmentationextractor.py#L210

So I'm curious why you had to wrap a try/except for the demonstration?

Otherwise, I'd be interested in two not-necessarily-exclusive pathways towards improving this...

a) Check if the Caiman issue is a bug (are there summary images in the example data that should be imported but currently aren't for some reason?)

b) Add some helper functions somewhere (would need to discuss where would be best) for extracting/generating summary image information for a given Segmentation object given the Imaging object that it corresponds to

What are you thoughts @RichieHakim?

RichieHakim commented 1 year ago

@CodyCBakerPhD: thank you for the prompt reply.

So I'm curious why you had to wrap a try/except for the demonstration?

I didn't. I should have removed it. Sorry for the confusion.

I think your suggested paths forward seem right. With respect to A) and B): are we confident that the example data in the gin repo is representative of the outputs from current releases of each segmentation method?

a) Check if the Caiman issue is a bug (are there summary images in the example data that should be imported but currently aren't for some reason?)

The caiman .h5 dataset in the gin repo is missing the 'Cn' field. I think I have seen FOV image fields in a caiman dataset I've worked with before. It may be worth looking into running your data through the caiman pipeline again.

b) Add some helper functions somewhere (would need to discuss where would be best) for extracting/generating summary image information for a given Segmentation object given the Imaging object that it corresponds to

The exact functions necessary to point to and extract images from the dataset would like come from a walk through the structure of each data file. I think a general solution, while imperfect, is acceptable here. Then, there could be two ways of looking up images: 1) via the key in the image dictionary, which is named according to the dataset's naming scheme and 2) via a more standardized name like 'intensity' and 'correlation'.

Let me know if that all makes sense and if you'd like to delegate anything to us.

CodyCBakerPhD commented 1 year ago

With respect to A) and B): are we confident that the example data in the gin repo is representative of the outputs from current releases of each segmentation method?

Absolutely not recent: these are just collections of examples we've found over the years. Sometimes the experimenter used a years old version of the software, and that was already years ago so I'm sure it adds up

If you could help round up some more up-to-date examples (as well as help identify ways to distinguish past and present versions - usually in the form of a metadata field that records the version used within the source files) that would be fantastic

The caiman .h5 dataset in the gin repo is missing the 'Cn' field. I think I have seen FOV image fields in a caiman dataset I've worked with before. It may be worth looking into running your data through the caiman pipeline again.

To clarify, these aren't our data and I don't think we were ever the ones that ran it through a given pipeline 😅

If you have more a recent example, especially one that has that optional field present, that would be great to add to the testing suite

The process for adding data to GIN and getting it into the testing suites is somewhat documented, but might be more convenient to just share a google drive and we'd be happy to handle it

I think a general solution, while imperfect, is acceptable here. Then, there could be two ways of looking up images: 1) via the key in the image dictionary, which is named according to the dataset's naming scheme and 2) via a more standardized name like 'intensity' and 'correlation'.

Yes, ideally the summary images would have been pre-calculated and saved/cached along with the segmented data - but in principle the information should be recoverable from the full images (I'm unsure how intensive of an operation this is?)

So if your group would know how to implement the computation of summary images of each time, that could be a great contribution! Probably somewhere under the 'extraction_tools.py` for now (can discuss better organization structure in the future)

RichieHakim commented 1 year ago

I do not have example data from various pipelines, nor do I think I am in a position to round it up easily. If we were given appropriate files I think we would be able to implement and PR it quickly though.

Yes, ideally the summary images would have been pre-calculated and saved/cached along with the segmented data - but in principle the information should be recoverable from the full images (I'm unsure how intensive of an operation this is?)

I don't know if it is necessary to actually perform any processing on the data. Simply storing and cataloging the images (as is already done) should be sufficient. I think the main issue is simply that I don't think that all classes are importing everything. Specifically: CaImAn is not importing anything, and CNMFE is only importing a correlation image. I know this is incomplete for caiman and I'd guess that CNMFE probably has a mean or other simple summary image in addition to the correlation image. If these holes were filled, I think it would be sufficient to being using roiextractors as an ingestion method.

RichieHakim commented 1 year ago

As a bandaid, I added retrieving an image from the caiman dataset that is similar to a mean image: data['estimates']['b']. This change is in the most recent PR: https://github.com/catalystneuro/roiextractors/pull/227

estimates.b is not exactly a mean, but rather 2 background spatial components. Summing them together gives something very similar to an activity subtracted mean image and is useful for many purposes.

catalystneuro / roiextractors