automatically generate full field images in mesoscope file splitting queue

danielsf commented 2 years ago

Once #522 is closed and we have the functionality to generate the full field images with superimposed ROIs (see discussion and examples in that ticket), we will need to update the mesoscope file splitting queue to automatically generate these images when all of the required data is present.

Based on a little archaeology, here is some relevant information.

The ruby strategy that governs mesoscope file splitting is the mesoscope_file_decomposition_strategy.rb
The mesoscope file decomposition strategy gets the names of input files from the {ophys_session_id}_platform.json file written by the rig (?) to the data landing directory. An example of a platform.json file can be found in
```
/allen/programs/braintv/workgroups/multiphoton/Natalia/fullfield_stitching_data_fro_Scott/1212880506
```
If present the full field TIFF input file will be the "fullfield_2p_image": field in that JSON file.
The ruby strategy will have to be updated to copy the path to the fullfield_2p_image file into the input JSON for the ophys_etl mesoscope_splitting module.
Not all sessions will have a fullfield_2p_image (@nataliaorlova ?), so the mesoscope_splitting module and the file decomposition ruby strategy need to gracefully skip over cases where the input data is missing (this is likely related to the work in #523)
Looking through the well_known_files table, I do not think the mesoscope_splitting module creates any well known files, so I don't think it will be necessary to modify the ruby strategy to record the new image being created in the well known files table, but we should verify that.

Tasks

[ ] Update ruby strategy to write fullfield_2p_image to the input json for mesoscope_splitting
[ ] Ask science team (probably Natalia) whether the new output image needs to be saved as a well known file
[ ] Create new well known file type to represent these files
[ ] Update mesoscope_splitting module to handle the new input (I may just fold this into #522; I will edit this ticket if that ends up happening)
[ ] Redeploy LIMS
[ ] Verify that mesoscope sessions with the fullfield 2p image are properly split and the new output image is created

nataliaorlova commented 2 years ago

Thanks, Scott.

A few comments:

Indeed, we for now do not make fullfield_2p_image data "obligatory" on acquisition step. We should discuss if this should remain the case.

I actually did ask Wayne some time ago to create a well known file for the stitched full field, and the name was supposed to also be fullfield_2p_image, but I think we decided to hold off until stitching is implemented more robustly than in a notebook on the rig. Can we please start this discussion again and maybe add to the tasks list??

danielsf commented 2 years ago

Scanning the well_known_file_types table name column, I do not see anything that corresponds to fullfield_2p_image, so we will have to create this file type whole cloth.

For the record: I believe that is a task that will fall to the platform team, since Pika doesn't actually have admin privileges on the LIMS database.

danielsf commented 1 year ago

Natalia and I discussed this issue on November 17, 2022. It was decided that the images produced by this function are already good enough to just incorporate into the production pipeline. Naively, that is going to require altering the ruby strategy to pass the path to the full_field stitched image into the mesoscope_tiff_splitter python module, then adding the necessary logic to the python module itself to process the full field TIFF.

However:

Looking at the mesoscope file decoposition ruby strategy, it looks like all of the parameters that are ultimately passed to the python module are copied directly from the platform.json file (see these lines) into the input.json file. Since the platform.json file is itself a well known file, it seems like we can just make path_to_patform_json an input parameter of the mesoscope splitter, and then have the mesoscope file splitter extract the path to the full field TIFF (if present) from platform.json, rather than having to implement the complicated logic to add another field to the mesoscope_tiff_splitter input.json.

Longer term, I think it would make sense to remove all of this parameter duplication and only pass the path to platform.json (alongside any necessary output parameters) into the mesoscope_tiff_splitter and just have the python read whatever it needs from platform.json*.

*unless there is significant risk of the platform.json schema changing, though, if that happened, it would still require a change to the ruby strategy. Better to just have the python process platform.json itself, which would leave all of the relevant code in Pika's control.

For now, however, we can just focus on getting the full field stitched tiff implemented. This involves the following steps

1) add path_to_platform_json to the mesoscope_tiff_splitter schema as an optional field so that the python module won't crash when that paramter shows up in the input.json 2) once (1) is merged, modify the ruby strategy so that the path to platform.json gets included in the mesoscope_tiff_splitter input.json 3) once (2) has been deployed, add logic to mesoscope_tiff_splitter python code to check for the existence of a full field stitched TIFF file and, if it is present, generate the desired figures automatically.

danielsf commented 1 year ago

@nataliaorlova @cbrrrry

I've put an example of the full field stitching file that the pipeline will generate here

/allen/aibs/informatics/danielsf/full_field_stitching/1212880506_stitched_full_field_img.h5

As Natalia and I discussed, this is a bare-bones output. It has two main datasets stitched_full_field which is just the full field image stitched together and stitched_full_field_with_rois, which has the ROIs from the average surface image TIFF superimposed over it. You can also access JSONized versions of the scanimage metadata from the two TIFF files as full_field_metadata and surface_roi_metadata.

Assuming that this file is sufficient the next steps are

1) Get Pika to review the code that generated this file and merge it into the production pipeline 2) Modify the ruby code (the code that actually runs the python modules) to pass in the parameters needed to tell the python to generate this file.

Only after both of these have been completed will these files start being produced in production. Note: the full field stitched image will not be listed as a well know file. You will just have to find it in the storage directory associated with an ophys session based on your knowledge of what the fill should be called (`{ophys_session_id}_stitched_full_field_img.h5')

Let me know if you have any questions or requests. Sorry this is taking so long.

danielsf commented 1 year ago

For record keeping, this file was generated from this branch

https://github.com/AllenInstitute/ophys_etl_pipelines/tree/danielsf/add/platform/json/path

cbrrrry commented 1 year ago

@danielsf Thanks for generating this. Before I sign on off the accuracy of this fix can you change the file-permission of /allen/aibs/informatics/danielsf/full_field_stitching/1212880506_stitched_full_field_img.h5 to allow reading?

danielsf commented 1 year ago

@cbrrrry try now

cbrrrry commented 1 year ago

2/4 ROIs in this example set are very well located (IOU > 0.9) on the fullfield images, and I would classify 2/4 as being in the right area, with less perfect placement (0.7>IOU>0.8).

It would be great to figure out where the discrepancy between areas is coming from, but I think that this implementation is fine for moving forward right now.

I can access the metadata included in this HDF5, and am wondering if you have a suggested class for handling it? Should I use the ScanImageMetadata class for this?

danielsf commented 1 year ago

If you wanted a pythonic way to access the metadata, you would have to implement a sub-class of ScanImageMetadata that replaces this line

https://github.com/AllenInstitute/ophys_etl_pipelines/blob/e3fe95404f59f04833fc89d3da95245c5edd1454/src/ophys_etl/modules/mesoscope_splitting/tiff_metadata.py#L34

which calls tifffile's method for reading the metadata from a ScanImage file with code that just deserilalizes the metadata from the JSON, i.e. something like

class ScanImageMetadataFromH5(ScanImageMetadata):
    """
    A class to handle reading and parsing the metadata that
    comes with the TIFF files produced by ScanImage
    Parameters
    ----------
    tiff_path: pathlib.Path
        Path to the TIFF file whose metadata we are parsing
    """

    def __init__(self, h5_path: pathlib.Path):
        self._file_path = tiff_path
        if not tiff_path.is_file():
            raise ValueError(f"{tiff_path.resolve().absolute()} "
                             "is not a file")
        with h5py.File(h5_path, 'r') as in_file:
            self._metadata = json.load(in_file['full_field_metadata'][()].decode('utf-8'))

It should work, and is obviously not that hard to implement. We just don't have it yet.

AllenInstitute / ophys_etl_pipelines

automatically generate full field images in mesoscope file splitting queue #535