angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
69 stars 25 forks source link

Even when setting `nuclear_counts = False`, nuclear masks are being quantified in `marker_quantification.generate_cell_table` #1097

Closed cliu72 closed 4 months ago

cliu72 commented 6 months ago

Describe the bug After setting nuclear_counts = False in marker_quantification.generate_cell_table, the nuclear masks are still being quantified and added to the cell table. This leads to unexpected behavior in the pixie cell clustering notebook, since now in the cell table, there are two rows for every fov/label pair (one for whole cell, one for nuclear). Because there are two different cell sizes associated with each cell label, this leads to errors. (Of course, the easy option is to subset the cell table for rows only where mask_type = whole_cell before running cell clustering, but it's not obvious for users that this is necessary, since nuclear_counts was set to False in the segmentation notebook, so the expectation is that nuclear masks are not quantified in the cell table)

I think the problem is here (https://github.com/angelolab/ark-analysis/blob/main/src/ark/segmentation/marker_quantification.py#L532-L585):

       mask_files = io_utils.list_files(segmentation_dir, substrs=fov_name)
       mask_types = process_lists(fov_names=fovs, mask_names=mask_files)

mask_types here is ['nuclear','whole_cell']. Therefore, both nuclear and whole cell segmentation masks are always being read (regardless of what nuclear_counts is set to).

Expected behavior If nuclear_counts=False, do not read in nuclear masks in generate_cell_table (and do not have rows for nuclear masks in the final cell table). I think the logic of generate_cell_table needs to change slightly to not read in nuclear masks if nuclear_counts=False.

To Reproduce As I noted in another issue (https://github.com/angelolab/ark-analysis/issues/1096), there is a problem with running the segmentation notebook using the example dataset, but if you manually delete fov10 from the example dataset, you should be able to run-through the entire segmentation notebook. After doing this and running through notebook 1, you can open either cell table and see that the "mask_type" column has both 'nuclear' and 'whole_cell'. Then, you can reproduce the error in this issue by running notebooks 2 and 3 subsequently.

This is the error I get in notebook 3 after doing the above (length mismatch because there are extra rows in the cell table): image