AllenInstitute / ophys_etl_pipelines

Pipelines and modules for processing optical physiology data
Other
9 stars 5 forks source link

automatically generate full field images in mesoscope file splitting queue #535

Closed danielsf closed 1 year ago

danielsf commented 2 years ago

Once #522 is closed and we have the functionality to generate the full field images with superimposed ROIs (see discussion and examples in that ticket), we will need to update the mesoscope file splitting queue to automatically generate these images when all of the required data is present.

Based on a little archaeology, here is some relevant information.

Tasks

nataliaorlova commented 2 years ago

Thanks, Scott.

A few comments:

Indeed, we for now do not make fullfield_2p_image data "obligatory" on acquisition step. We should discuss if this should remain the case.

I actually did ask Wayne some time ago to create a well known file for the stitched full field, and the name was supposed to also be fullfield_2p_image, but I think we decided to hold off until stitching is implemented more robustly than in a notebook on the rig. Can we please start this discussion again and maybe add to the tasks list??

danielsf commented 2 years ago

Scanning the well_known_file_types table name column, I do not see anything that corresponds to fullfield_2p_image, so we will have to create this file type whole cloth.

For the record: I believe that is a task that will fall to the platform team, since Pika doesn't actually have admin privileges on the LIMS database.

danielsf commented 1 year ago

Natalia and I discussed this issue on November 17, 2022. It was decided that the images produced by this function are already good enough to just incorporate into the production pipeline. Naively, that is going to require altering the ruby strategy to pass the path to the full_field stitched image into the mesoscope_tiff_splitter python module, then adding the necessary logic to the python module itself to process the full field TIFF.

However:

Looking at the mesoscope file decoposition ruby strategy, it looks like all of the parameters that are ultimately passed to the python module are copied directly from the platform.json file (see these lines) into the input.json file. Since the platform.json file is itself a well known file, it seems like we can just make path_to_patform_json an input parameter of the mesoscope splitter, and then have the mesoscope file splitter extract the path to the full field TIFF (if present) from platform.json, rather than having to implement the complicated logic to add another field to the mesoscope_tiff_splitter input.json.

Longer term, I think it would make sense to remove all of this parameter duplication and only pass the path to platform.json (alongside any necessary output parameters) into the mesoscope_tiff_splitter and just have the python read whatever it needs from platform.json*.

*unless there is significant risk of the platform.json schema changing, though, if that happened, it would still require a change to the ruby strategy. Better to just have the python process platform.json itself, which would leave all of the relevant code in Pika's control.

For now, however, we can just focus on getting the full field stitched tiff implemented. This involves the following steps

1) add path_to_platform_json to the mesoscope_tiff_splitter schema as an optional field so that the python module won't crash when that paramter shows up in the input.json 2) once (1) is merged, modify the ruby strategy so that the path to platform.json gets included in the mesoscope_tiff_splitter input.json 3) once (2) has been deployed, add logic to mesoscope_tiff_splitter python code to check for the existence of a full field stitched TIFF file and, if it is present, generate the desired figures automatically.

danielsf commented 1 year ago

@nataliaorlova @cbrrrry

I've put an example of the full field stitching file that the pipeline will generate here

/allen/aibs/informatics/danielsf/full_field_stitching/1212880506_stitched_full_field_img.h5

As Natalia and I discussed, this is a bare-bones output. It has two main datasets stitched_full_field which is just the full field image stitched together and stitched_full_field_with_rois, which has the ROIs from the average surface image TIFF superimposed over it. You can also access JSONized versions of the scanimage metadata from the two TIFF files as full_field_metadata and surface_roi_metadata.

Assuming that this file is sufficient the next steps are

1) Get Pika to review the code that generated this file and merge it into the production pipeline 2) Modify the ruby code (the code that actually runs the python modules) to pass in the parameters needed to tell the python to generate this file.

Only after both of these have been completed will these files start being produced in production. Note: the full field stitched image will not be listed as a well know file. You will just have to find it in the storage directory associated with an ophys session based on your knowledge of what the fill should be called (`{ophys_session_id}_stitched_full_field_img.h5')

Let me know if you have any questions or requests. Sorry this is taking so long.

danielsf commented 1 year ago

For record keeping, this file was generated from this branch

https://github.com/AllenInstitute/ophys_etl_pipelines/tree/danielsf/add/platform/json/path

cbrrrry commented 1 year ago

@danielsf Thanks for generating this. Before I sign on off the accuracy of this fix can you change the file-permission of /allen/aibs/informatics/danielsf/full_field_stitching/1212880506_stitched_full_field_img.h5 to allow reading?

danielsf commented 1 year ago

@cbrrrry try now

cbrrrry commented 1 year ago

2/4 ROIs in this example set are very well located (IOU > 0.9) on the fullfield images, and I would classify 2/4 as being in the right area, with less perfect placement (0.7>IOU>0.8).

It would be great to figure out where the discrepancy between areas is coming from, but I think that this implementation is fine for moving forward right now.

I can access the metadata included in this HDF5, and am wondering if you have a suggested class for handling it? Should I use the ScanImageMetadata class for this?

danielsf commented 1 year ago

If you wanted a pythonic way to access the metadata, you would have to implement a sub-class of ScanImageMetadata that replaces this line

https://github.com/AllenInstitute/ophys_etl_pipelines/blob/e3fe95404f59f04833fc89d3da95245c5edd1454/src/ophys_etl/modules/mesoscope_splitting/tiff_metadata.py#L34

which calls tifffile's method for reading the metadata from a ScanImage file with code that just deserilalizes the metadata from the JSON, i.e. something like

class ScanImageMetadataFromH5(ScanImageMetadata):
    """
    A class to handle reading and parsing the metadata that
    comes with the TIFF files produced by ScanImage
    Parameters
    ----------
    tiff_path: pathlib.Path
        Path to the TIFF file whose metadata we are parsing
    """

    def __init__(self, h5_path: pathlib.Path):
        self._file_path = tiff_path
        if not tiff_path.is_file():
            raise ValueError(f"{tiff_path.resolve().absolute()} "
                             "is not a file")
        with h5py.File(h5_path, 'r') as in_file:
            self._metadata = json.load(in_file['full_field_metadata'][()].decode('utf-8'))

It should work, and is obviously not that hard to implement. We just don't have it yet.