angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
71 stars 25 forks source link

Fix cell neighbors matrix/clustering dropping for cells that don't have any neighbors #906

Closed camisowers closed 1 year ago

camisowers commented 1 year ago

Please refer to our FAQ and look at our known issues before opening a bug report.

Describe the bug Currently, the kmeans neighborhood analysis notebook runs with a neighborhood matrix as the input. This is a matrix with each cell as a row, and the columns are all the cell phenotypes. However, for cells that have no neighbors within the specified radius (i.e. a row of zeros in the matrix), these rows are dropped from the data. Subsequently, such cells are excluded from the kmeans clustering and are missing from the new cell table saved at the end of the notebook.

Expected behavior This data is excluded from clustering to avoid an additional cluster containing cells with no close neighbors. There are a few options:

does not change clustering

potentially changes clustering

ngreenwald commented 1 year ago

Can you take a look at the TONIC dataset and see what fraction of cells this applies to?

camisowers commented 1 year ago

Seems to automatically drop only 1-2% of cells. We will issue a warning for any more than 5%, suggest an increase in the pixel radius, and ensure that cells without a cluster assignment are still retained in the cell table with NaN value.