Many previously-released datasets have provided cell identity labels for each cell in their dataset. It would be great to be able to integrate this information into our analyses in scanpy.
Some ideas for how this data might be useful:
Checking for identity preservation in new embedding
Color cells in the original dataset by their author-assigned identitity
See how well the UMAP/Leiden separates out those identities in gene space
In the new embedding, check for preservation of groups of cells with similar identity
Perhaps using a Fisher's exact test?
Checking for common terms in cross-species datasets
When putting cells from multiple species into the same space, we could get a collection of terms or identities for individual clusters and ask whether they make sense
When working with species with unknown cell identities, we can also use this approach to transfer annotations from one species to another
Many previously-released datasets have provided cell identity labels for each cell in their dataset. It would be great to be able to integrate this information into our analyses in scanpy.
Some ideas for how this data might be useful:
Checking for identity preservation in new embedding
Checking for common terms in cross-species datasets