czbiohub-sf / tabula-muris-senis

Tabula Muris Senis
http://tabula-muris-senis.ds.czbiohub.org
BSD 3-Clause "New" or "Revised" License
93 stars 26 forks source link

Raw data and processed data not matching? #7

Closed huiwenzh closed 4 years ago

huiwenzh commented 4 years ago

Hi,

I was trying to figure out the cells' raw counts based on the cell id from the processed data in bone marrow Facs sorted data (download from figshare). I can manage to find the 24m age group's cell id but not the 3m age group as both groups were labelled quite differently. I.e. A2_B003031_S98_L004.mus-5-0 for the 24m group but D13.D042193.3_8_M.1.1-1 for the 3m age group. Is there any way I can match the cell id or match back to the raw count?

Thanks in advance, Huiwen

aopisco commented 4 years ago

hi @Huiwen-UQ

we noticed that and @olgabot created matching metadata files available from this folder: https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis/Metadata/?region=us-west-2

let us know if that solves the issue!

huiwenzh commented 4 years ago

hi @Huiwen-UQ

we noticed that and @olgabot created matching metadata files available from this folder: https://s3.console.aws.amazon.com/s3/buckets/czb-tabula-muris-senis/Metadata/?region=us-west-2

let us know if that solves the issue!

Thank you so much for the fast reply and for providing the matching metadata! With these metadata files, I found some cells in Marrow tissue were labelled differently in _cell_ontologyclass in both raw and processed data. For instance, cell id "C22_B002327_S46_L003.mus-0-0" is labelled as Naive B cell in the raw count metadata but labelled as MPP Fraction B in the processed Seurat object metadata. Cell id " B10.MAA000844.3_10_M.1.1-1" is labelled as immature B cell in the raw count metadata but labelled as Naive B cell in the processed Seurat object metadata.

Can you please check which way is labelling correctly?

Best, Huiwen

aopisco commented 4 years ago

@Huiwen-UQ I don't know which Seurat file you are referring to. We re annotated the dataset between Tabula Muris and Tabula Muris Senis -- is that the source of confusion?

huiwenzh commented 4 years ago

@aopisco I downloaded the marrow Seurat processed file from Tabula Muris Senis in figshare and use this to compare with the raw count dataset metadata annotation.

olgabot commented 4 years ago

Hi @Huiwen-UQ, if it is a Seurat file, it is likely from Tabula Muris rather than Tabula Muris Senis. The annotations for the 3 month data were updated in Tabula Muris Senis, and I suggest using those instead of the original Tabula Muris ones. For us to help debug, could you send a link to the exact file you used?

huiwenzh commented 4 years ago

Sorry for my misunderstanding, it's h5ad files to use with scanpy file but I opened it through R (which showed up as a Seurat obj). I used Marrow_facs.h5ad file, tabula-muris-senis-facs-official-raw-obj.h5ad file and the matching metadata files created earlier for this issue.

aopisco commented 4 years ago

@Huiwen-UQ following up here, can I close the issue or you still need help?

huiwenzh commented 4 years ago

Yes, everything is all good now. Thanks for your help!