Open pgarrison opened 1 month ago
Strange, we already have bioio-ome-zarr
as a dependency. This may be user error.
Okay, I went down the wrong path: it's not a dependency issue, it's that there is no image at s3://allencell/aics/emt_timelapse_dataset/data/3500005827_9_collagenIV_segmentation_probability.ome.zarr
AWS_PROFILE=open_data_bucket aws s3api list-object-versions --bucket allencell --prefix aics/emt_timelapse_dataset/data/3500005827_9_collagenIV_segmentation_probability.ome.zarr
3500005827_20_collagenIV_segmentation_probability.ome.zarr
. The data uploaded is correct and the manifest is wrong.3500005827_20
Nuclei_localization.py
to see if more files it needs are missing.@smishra3 we need your eye on this.
I double checked and there is no basement membrane segmentation for this id 3500005827_9.
3500005827_9 has a fms-id - 08d16b7278e24a5c8cdf8c3f723f4859. Goutham has never generated segmentations for this fms id. I also checked the parent directory and it's not there either (\allen\aics\assay-dev\computational\data\EMT_deliverable_processing\Collagen_segmentation_segmentations).
I saw it's in the manifest. Checking now what's the issue.
@niveditasa do you know if this movie was ever analyzed?
I found the bug. 3500005827_9 does not exist but 3500005827_20 exits. In the current manifest, for collagenIV segmentation (s3://allencell/aics/emt_timelapse_dataset/data/3500005827_9_collagenIV_segmentation_probability.ome.zarr) path is provided in the manifest, but it represnts s3://allencell/aics/emt_timelapse_dataset/data/3500005827_20_collagenIV_segmentation_probability.ome.zarr.
At present s3://allencell/aics/emt_timelapse_dataset/data/3500005827_20_collagenIV_segmentation_probability.ome.zarr exists but not included in the manifest, and s3://allencell/aics/emt_timelapse_dataset/data/3500005827_9_collagenIV_segmentation_probability.ome.zarr doesn't exist and is included in the manifest. I guess it was caused during the final naming change to barcode_scene format.
@pgarrison can you rerun your test on this updated manifest? imaging_and_segmentation_data_v1_09102024.csv
Only the rows containing 3500005827_9 and 3500005827_20 have been changed to fix the bug.
@smishra3 Are you saying that the segmentations uploaded are correct and it's just the manifest that is off? So we are not expected to have a basement membrane segmentation for 3500005827_9, and we are supposed to have one for 3500005827_20?
@pgarrison Exactly. Collagen IV for 3500005827_9 is never done and is also not expected to be done either as 3500005827_9 is a 2D colony. In the paper and in current work and analysis, no collagenIV segmentation is done for a 2D colony (all membrane segmentations are for 3D colonies).
It's a naming issue. And I'm not sure how it got overlooked. When I tried the web viewer link (of the collagenIV and the combined), the link were also dead. I had an opinion that all links were tested.
With changing the naming you can now see the web viewer links are also working.
The following root cause diagnosis is summarized from an in-person discussion with @smishra3.
There are 3 tightly related errors in the manifest:
3500005827_9
row has values in the collagen IV segmentation columns3500005827_20
row does not have values in the collagen IV segmentation columns3500005827_9
row has web volume viewer links with the data for 3500005827_20
.The two affected movies, 3500005827_9
and 3500005827_20
, are adjacent rows in the manifest. We don't have precise information about how the manifest was edited, but this is a very strong clue that it was a simple typo, entering data into the wrong row.
*_collagenIV_segmentation_probability.ome.zarr
segmentations was consistent with the manifest (49). This issue left the number of segmentations unchanged because the manifest identified 3500005827_9
instead of 3500005827_20
.3500005827_9
, the manifest link to the web volume viewer uses the 3500005827_20
data, so the link was correct.@smishra3 @mfs4rd I successfully ran the Nuclei_localization.py
with the updated CSV (from Suraj's comment above) and produced 49 files _localized_nuclei.csv
. So this resolves the error from the original post. I think I'll go ahead with updating the published CSV. At the same time, is there anything else we can do to validate it is correct?
@pgarrison The localization code is identical to what was used for the data/results published aside from the changes needed to be compatible with s3 storage by downloading the files to a local directory, and @smishra3 @antoineborensztejn and I verified that the meshes on s3 are identical to the originals. I don't think there is anything else that needs to be validated atm.
When I run
Nuclei_localization.py
I get the following error.