angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
69 stars 25 forks source link

Allow for a subset of FOVs in k-means clustering notebook #1122

Open cliu72 opened 4 months ago

cliu72 commented 4 months ago

Describe the bug There is no option to choose a subset of FOVs in the k-means neighborhood notebook. Discovered by Avery.

Currently, the notebook gets all FOVs in the cell table (all_fovs = all_data[settings.FOV_ID].unique() in the notebook), then uses all FOVs in the segmentation directory to calculate the distance matrix (https://github.com/angelolab/ark-analysis/blob/main/src/ark/analysis/spatial_analysis_utils.py#L37). If you manually change all_fovs in the notebook to try to run k-means only on a subset of FOVs, it errors out.

Expected behavior Allow users to choose a subset of FOVs to run k-means on.

To Reproduce Change all_fovs in the kmeans notebook to be a subset of FOVs.

camisowers commented 4 months ago

If we do this, a neighbors matrix will be generated and saved based on the provided subset of cells, which could potentially cause issues in other spatial scripts. It likely makes more sense to generate the distance matrices and neighbors matrix for the full data, and then just subset the neighbors data to input to k-means!