angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
69 stars 25 forks source link

Error in `create_mantis_dir` in cell clustering notebook if running `generate_and_save_cell_cluster_masks` multiple times #1099

Closed cliu72 closed 4 months ago

cliu72 commented 6 months ago

Describe the bug This is a pretty niche problem, but hopefully this fix is straight forward. When I run the "4.4: Generate cell phenotype maps" section of the cell clustering notebook more than once, I get this error: KeyError: "['cluster_id'] not in index". The use case for this is if a user wants to look at ['fov1','fov2'] first and runs the rest of the notebook with subset_cell_fovs = ['fov1','fov2'], then decides they want to look at other fovs, so changes subset_cell_fovs and tries to run the rest of the notebook again.

I think this error is because in generate_and_save_cell_cluster_masks, a cluster_id column is added to the mapping csv. However, if this function is run multiple times, because the cluster_id column already exists, the columns are not merged properly: image

Expected behavior Should be allowed to change subset_cell_fovs and re-run the rest of the notebook without issue.

To Reproduce Run the example dataset in notebook 3 all the way through. Go back to the start of section 4.4, change subset_cell_fovs to something else (ex. ['fov3','fov4']) and run the rest of the notebook. After doing this, I got this error in the last cell: image

camisowers commented 6 months ago

The cluster_id should be the same values for both columns (not affected by the fov subset used), so I think it would be really easy to just not re-generate the column if it already exists.

alex-l-kong commented 5 months ago

@camisowers agreed. I can add a fix for that.