AllenCell / EMT_data_analysis

This repo contains code for EMT deliverable data analysis and has dependencies associated with output from EMT_image_analysis repo.
Other
1 stars 0 forks source link

Incorrect manifest for Nuclei_localization.py #7

Open pgarrison opened 1 month ago

pgarrison commented 1 month ago

When I run Nuclei_localization.py I get the following error.

bioio_base.exceptions.UnsupportedFileFormatError: BioImage does not support the image: 'https://allencell.s3.amazonaws.com/aics/emt_timelapse_dataset/data/3500005827_9_collagenIV_segmentation_probability.ome.zarr'. You may need to install an extra format dependency. See our list of known plugins in the bioio README here: https://github.com/bioio-devs/bioio for a list of known plugins. You can also call the 'bioio.plugins.dump_plugins()' method to report information about currently installed plugins or the 'bioio.plugin_feasibility_report(image)' method to check if a specific image can be handled by the available plugins.
pgarrison commented 1 month ago

Strange, we already have bioio-ome-zarr as a dependency. This may be user error.

pgarrison commented 1 month ago

Related https://github.com/bioio-devs/bioio-ome-zarr/issues/28

pgarrison commented 1 month ago

Okay, I went down the wrong path: it's not a dependency issue, it's that there is no image at s3://allencell/aics/emt_timelapse_dataset/data/3500005827_9_collagenIV_segmentation_probability.ome.zarr

Investigation questions

Resolution tasks

Post-mortem questions

vianamp commented 1 month ago

@smishra3 we need your eye on this.

smishra3 commented 1 month ago

I double checked and there is no basement membrane segmentation for this id 3500005827_9.

3500005827_9 has a fms-id - 08d16b7278e24a5c8cdf8c3f723f4859. Goutham has never generated segmentations for this fms id. I also checked the parent directory and it's not there either (\allen\aics\assay-dev\computational\data\EMT_deliverable_processing\Collagen_segmentation_segmentations).

I saw it's in the manifest. Checking now what's the issue.

vianamp commented 1 month ago

@niveditasa do you know if this movie was ever analyzed?

smishra3 commented 1 month ago

I found the bug. 3500005827_9 does not exist but 3500005827_20 exits. In the current manifest, for collagenIV segmentation (s3://allencell/aics/emt_timelapse_dataset/data/3500005827_9_collagenIV_segmentation_probability.ome.zarr) path is provided in the manifest, but it represnts s3://allencell/aics/emt_timelapse_dataset/data/3500005827_20_collagenIV_segmentation_probability.ome.zarr.

At present s3://allencell/aics/emt_timelapse_dataset/data/3500005827_20_collagenIV_segmentation_probability.ome.zarr exists but not included in the manifest, and s3://allencell/aics/emt_timelapse_dataset/data/3500005827_9_collagenIV_segmentation_probability.ome.zarr doesn't exist and is included in the manifest. I guess it was caused during the final naming change to barcode_scene format.

smishra3 commented 1 month ago

@pgarrison can you rerun your test on this updated manifest? imaging_and_segmentation_data_v1_09102024.csv

Only the rows containing 3500005827_9 and 3500005827_20 have been changed to fix the bug.

pgarrison commented 1 month ago

@smishra3 Are you saying that the segmentations uploaded are correct and it's just the manifest that is off? So we are not expected to have a basement membrane segmentation for 3500005827_9, and we are supposed to have one for 3500005827_20?

smishra3 commented 1 month ago

@pgarrison Exactly. Collagen IV for 3500005827_9 is never done and is also not expected to be done either as 3500005827_9 is a 2D colony. In the paper and in current work and analysis, no collagenIV segmentation is done for a 2D colony (all membrane segmentations are for 3D colonies).

It's a naming issue. And I'm not sure how it got overlooked. When I tried the web viewer link (of the collagenIV and the combined), the link were also dead. I had an opinion that all links were tested.

With changing the naming you can now see the web viewer links are also working.

pgarrison commented 1 month ago

The following root cause diagnosis is summarized from an in-person discussion with @smishra3.

Root cause

There are 3 tightly related errors in the manifest:

The two affected movies, 3500005827_9 and 3500005827_20, are adjacent rows in the manifest. We don't have precise information about how the manifest was edited, but this is a very strong clue that it was a simple typo, entering data into the wrong row.

How our data validation steps failed to catch it

  1. Two people independently validated that the count of *_collagenIV_segmentation_probability.ome.zarr segmentations was consistent with the manifest (49). This issue left the number of segmentations unchanged because the manifest identified 3500005827_9 instead of 3500005827_20.
  2. All of the web volume viewer links in the manifest were validated by opening in the web browser. In the row for 3500005827_9, the manifest link to the web volume viewer uses the 3500005827_20 data, so the link was correct.
pgarrison commented 1 month ago

@smishra3 @mfs4rd I successfully ran the Nuclei_localization.py with the updated CSV (from Suraj's comment above) and produced 49 files _localized_nuclei.csv. So this resolves the error from the original post. I think I'll go ahead with updating the published CSV. At the same time, is there anything else we can do to validate it is correct?

mfs4rd commented 1 month ago

@pgarrison The localization code is identical to what was used for the data/results published aside from the changes needed to be compatible with s3 storage by downloading the files to a local directory, and @smishra3 @antoineborensztejn and I verified that the meshes on s3 are identical to the originals. I don't think there is anything else that needs to be validated atm.