czbiohub-sf / tabula-muris-senis

Tabula Muris Senis
http://tabula-muris-senis.ds.czbiohub.org
BSD 3-Clause "New" or "Revised" License
93 stars 26 forks source link

loss some cells after import ? and how are cell markers defined ? #35

Open Nidane opened 2 years ago

Nidane commented 2 years ago

Hi! Thanks a lot for this amazing work!

I recently encounter some issue when trying to analyze some data from specific organs. For example:

https://tabula-muris-senis.ds.czbiohub.org/thymus/droplet/ here it described the whole thymus analyzed through the Droplet pipeline contains 9275 cells and includes DN3, DN4, double negative T cell, immature T cell, professional APC and thymocyte.

but when I download the "h5ad" file from https://figshare.com/articles/dataset/Processed_files_to_use_with_scanpy_/8273102/2

and run: "

thymus2 = sc.read_h5ad("C:/.../32669714/Thymus_droplet.h5ad") C:...\miniconda3\lib\site-packages\anndata\compat__init__.py:180: FutureWarning: Moving element from .uns['neighbors']['distances'] to .obsp['distances'].

This is where adjacency matrices should go now. warn( C:...\miniconda3\lib\site-packages\anndata\compat__init__.py:180: FutureWarning: Moving element from .uns['neighbors']['connectivities'] to .obsp['connectivities'].

This is where adjacency matrices should go now. warn(

thymus2 AnnData object with n_obs × n_vars = 7570 × 19860 obs: 'age', 'batch', 'cell', 'cell_ontology_class', 'cell_ontology_id', 'free_annotation', 'method', 'mouse.id', 'n_genes', 'sex', 'subtissue', 'tissue', 'tissue_free_annotation', 'n_counts', 'louvain', 'cluster_names', 'leiden' var: 'n_cells', 'means', 'dispersions', 'dispersions_norm', 'highly_variable' uns: 'leiden', 'louvain', 'neighbors', 'pca', 'rank_genes_groups' obsm: 'X_pca', 'X_umap', 'X_tsne' varm: 'PCs' obsp: 'distances', 'connectivities' "

so it seems like the cell number is 7570 which is lower than 9275 ? (difference ~1700)

and when I went through the "cell_ontology_class" I think I missed the "immature T cell", as I could not find this annotation (the number this population is about ~1700).

So I wonder which step I might do it wrong and lead to such problem ?

Also, just might be an naive question (maybe I missed some details regarding the methods ?), in theory we should expect DN-DP-SP populations within thymus, so why only DN is annotated here ? and plus, DN3, DN4, double negative T cell, immature T cell, and thymocyte, sound a bit confusing and feel like (based on conventional flow cytometry analysis) there might be some overlapping between these populations ? or for example, should the thymocyte be a mixture of SP etc. ?

Thank you very much!