angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
72 stars 26 forks source link

Cell clustering unable to parse updated cell table format from `ez_seg` update #1106

Closed alex-l-kong closed 8 months ago

alex-l-kong commented 8 months ago

Describe the bug

The addition of ezSegmenter support now generates a cell table that include both the whole_cell and nuclear mask data, as well as a column, mask_type, specifying which is which.

This cell table format is currently incompatible with cell_cluster_utils.create_c2pc_data, which is required to create the cell-level data needed for clustering. The function requires each label to be unique for each FOV so it can index and assign correctly. That assumption cannot be made for the new cell table format, since each label now appears n times per fov, where n is the number of unique masks in question.

Expected behavior

For Pixie/Nimbus, the user should only be using whole_cell data. To extract the cell tables on an individual mask level, @bryjcannon included a function in the ez_seg_utils submodule that splits the combined cell table by the mask_type column. This should be called in notebook 1 on both the combined size_normalized and arcsinh_transformed tables.

Users can then choose which table (size_normalized or arcsinh_transformed) to input into notebook 3 or Nimbus.

To Reproduce

Run notebook 1, then Pixie/Nimbus, as is.

cliu72 commented 8 months ago

I created a very similar issue already: https://github.com/angelolab/ark-analysis/issues/1097

alex-l-kong commented 8 months ago

@cliu72 thanks for reminding, I'll close this one.