Open camisowers opened 5 months ago
@srivarra @alex-l-kong @ngreenwald @jranek Lmk your thoughts. I'm leaning towards a hard switch since we just released v0.7.0 of ark and anyone super committed to staying away from AnnData can always use that version.
I agree, I think it makes sense to just switch. Everything else looks good.
Looks good to me!
Looks good
This is for internal use only; if you'd like to open an issue or request a new feature, please open a bug or enhancement issue
Relevant background
Now that we have the AnnData conversion notebook, we can start implementing AnnData with some of the spatial analysis functions.
Design overview
We have two options:
anndata_dir
arg which will load in and save to the AnnData table if provided. When left asNone
, the functions will continue to use the cell_table.csv and process the same as before. This isn't too much double code because the Anndata reading and writing calls are very simple, but it'll add quite a few extraif
statements.Things which will be adapted for AnnData compatibility in this new PR:
Since all the spatial notebooks (except cell distances) only take the neighborhood mats as inputs, these changes will not break any other notebooks if we choose to do a hard fork. ** Note: This will break the spatial enrichment code we have in ark, but I'm pretty confident we can just scrap the old stuff and build on squidpy's nhood_enrichment function.
Code mockup I can write up more specific pseudocode when we decide between a hard or soft fork, but for now here's which functions will need to be altered regardless.
1. Distance matrices calculation
Currently we load in the segmentations masks to get the labels and centroids of the cells by running
regionprops()
. This doesn't make much sense in our pipeline anymore since this is already done in the segmentation notebook and the centroids are stored in the cell table; I think moving forward we can skip loading in the segmentation masks and re-running regionprops (it's very time consuming using the NAS and also redundant).Also I will probably do some efficiency testing to find a replacement for cdist() that creates a sparse matrix rather than computing ever pairwise distance between cells.
Adjusted functions:
calc_dist_matrix
- adjust for AnnDataSoft fork example:
Hard fork example:
2. Neighborhood matrices calculation Adjusted functions:
create_neighborhood_matrix
3. Cell Distances analysis Adjusted functions:
generate_cell_distance_analysis
possiblycalculate_mean_distance_to_all_cell_types
andcalculate_mean_distance_to_cell_type
as wellRequired inputs
Cell table / AnnData table.
Output files
Since no one seems to really open or use the saved distance matrix xarrays anyways, I think we can just append them to the AnnData table and save some storage space.
The neighborhood mats and cell distance results will be saved to both the AnnData table and as individual csvs.
Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.
Estimated date when a fully implemented version will be ready for review: 1/19
Estimated date when the finalized project will be merged in: 1/23