Closed danielsf closed 1 year ago
Thanks, Scott.
A few comments:
Indeed, we for now do not make fullfield_2p_image data "obligatory" on acquisition step. We should discuss if this should remain the case.
I actually did ask Wayne some time ago to create a well known file for the stitched full field, and the name was supposed to also be fullfield_2p_image
, but I think we decided to hold off until stitching is implemented more robustly than in a notebook on the rig. Can we please start this discussion again and maybe add to the tasks list??
Scanning the well_known_file_types
table name
column, I do not see anything that corresponds to fullfield_2p_image
, so we will have to create this file type whole cloth.
For the record: I believe that is a task that will fall to the platform team, since Pika doesn't actually have admin privileges on the LIMS database.
Natalia and I discussed this issue on November 17, 2022. It was decided that the images produced by this function are already good enough to just incorporate into the production pipeline. Naively, that is going to require altering the ruby strategy to pass the path to the full_field stitched image into the mesoscope_tiff_splitter python module, then adding the necessary logic to the python module itself to process the full field TIFF.
However:
Looking at the mesoscope file decoposition ruby strategy, it looks like all of the parameters that are ultimately passed to the python module are copied directly from the platform.json file (see these lines) into the input.json file. Since the platform.json file is itself a well known file, it seems like we can just make path_to_patform_json
an input parameter of the mesoscope splitter, and then have the mesoscope file splitter extract the path to the full field TIFF (if present) from platform.json, rather than having to implement the complicated logic to add another field to the mesoscope_tiff_splitter input.json.
Longer term, I think it would make sense to remove all of this parameter duplication and only pass the path to platform.json (alongside any necessary output parameters) into the mesoscope_tiff_splitter and just have the python read whatever it needs from platform.json*.
*unless there is significant risk of the platform.json schema changing, though, if that happened, it would still require a change to the ruby strategy. Better to just have the python process platform.json itself, which would leave all of the relevant code in Pika's control.
For now, however, we can just focus on getting the full field stitched tiff implemented. This involves the following steps
1) add path_to_platform_json
to the mesoscope_tiff_splitter schema as an optional field so that the python module won't crash when that paramter shows up in the input.json
2) once (1) is merged, modify the ruby strategy so that the path to platform.json gets included in the mesoscope_tiff_splitter input.json
3) once (2) has been deployed, add logic to mesoscope_tiff_splitter python code to check for the existence of a full field stitched TIFF file and, if it is present, generate the desired figures automatically.
@nataliaorlova @cbrrrry
I've put an example of the full field stitching file that the pipeline will generate here
/allen/aibs/informatics/danielsf/full_field_stitching/1212880506_stitched_full_field_img.h5
As Natalia and I discussed, this is a bare-bones output. It has two main datasets stitched_full_field
which is just the full field image stitched together and stitched_full_field_with_rois
, which has the ROIs from the average surface image TIFF superimposed over it. You can also access JSONized versions of the scanimage metadata from the two TIFF files as full_field_metadata
and surface_roi_metadata
.
Assuming that this file is sufficient the next steps are
1) Get Pika to review the code that generated this file and merge it into the production pipeline 2) Modify the ruby code (the code that actually runs the python modules) to pass in the parameters needed to tell the python to generate this file.
Only after both of these have been completed will these files start being produced in production. Note: the full field stitched image will not be listed as a well know file. You will just have to find it in the storage directory associated with an ophys session based on your knowledge of what the fill should be called (`{ophys_session_id}_stitched_full_field_img.h5')
Let me know if you have any questions or requests. Sorry this is taking so long.
For record keeping, this file was generated from this branch
https://github.com/AllenInstitute/ophys_etl_pipelines/tree/danielsf/add/platform/json/path
@danielsf Thanks for generating this. Before I sign on off the accuracy of this fix can you change the file-permission
of /allen/aibs/informatics/danielsf/full_field_stitching/1212880506_stitched_full_field_img.h5
to allow reading?
@cbrrrry try now
2/4 ROIs in this example set are very well located (IOU > 0.9) on the fullfield images, and I would classify 2/4 as being in the right area, with less perfect placement (0.7>IOU>0.8).
It would be great to figure out where the discrepancy between areas is coming from, but I think that this implementation is fine for moving forward right now.
I can access the metadata included in this HDF5, and am wondering if you have a suggested class for handling it? Should I use the ScanImageMetadata class for this?
If you wanted a pythonic way to access the metadata, you would have to implement a sub-class of ScanImageMetadata that replaces this line
which calls tifffile
's method for reading the metadata from a ScanImage file with code that just deserilalizes the metadata from the JSON, i.e. something like
class ScanImageMetadataFromH5(ScanImageMetadata):
"""
A class to handle reading and parsing the metadata that
comes with the TIFF files produced by ScanImage
Parameters
----------
tiff_path: pathlib.Path
Path to the TIFF file whose metadata we are parsing
"""
def __init__(self, h5_path: pathlib.Path):
self._file_path = tiff_path
if not tiff_path.is_file():
raise ValueError(f"{tiff_path.resolve().absolute()} "
"is not a file")
with h5py.File(h5_path, 'r') as in_file:
self._metadata = json.load(in_file['full_field_metadata'][()].decode('utf-8'))
It should work, and is obviously not that hard to implement. We just don't have it yet.
Once #522 is closed and we have the functionality to generate the full field images with superimposed ROIs (see discussion and examples in that ticket), we will need to update the mesoscope file splitting queue to automatically generate these images when all of the required data is present.
Based on a little archaeology, here is some relevant information.
mesoscope_file_decomposition_strategy.rb
{ophys_session_id}_platform.json
file written by the rig (?) to the data landing directory. An example of aplatform.json
file can be found in"fullfield_2p_image":
field in that JSON file.fullfield_2p_image
file into the input JSON for the ophys_etl mesoscope_splitting module.well_known_files
table, I do not think the mesoscope_splitting module creates any well known files, so I don't think it will be necessary to modify the ruby strategy to record the new image being created in the well known files table, but we should verify that.Tasks
fullfield_2p_image
to the input json for mesoscope_splitting