Closed BFonseca11 closed 4 years ago
Hi! Thanks for the interest and for reaching out. I did intent to make more digested outputs available, but reached the size limit on Zenodo. I agree that cell counts are pretty handy. You can find them in the following two files: https://wcm.box.com/shared/static/nnqhgim66g49d8xa5ix5511og6z3gil9.pq https://wcm.box.com/shared/static/osprb3urgw1ejbdg7a3w49lrv7ympggx.pq
The first one has absolute cell counts for each image (rows) in each meta-cluster (columns), and the second counts for the original clusters, where the last column are putative cells without assigned cluster identity. Percentages can be calculated from the total number of cells in each row of the second matrix.
Best wishes, Andre
>>> import pandas as pd
>>> aggcounts = pd.read_parquet("https://wcm.box.com/shared/static/nnqhgim66g49d8xa5ix5511og6z3gil9.pq")
>>> aggcounts.head()
B cells CD4 T-cells CD8 T-cells Club cells Dendritic cells Dying cells ... Mesenchymal cells Monocytes NK-cells Neutrophils Proliferating cells Smooth muscle cells
roi ...
20200609_ARDS_1921-01 6 62 97 2 4 10 ... 7 55 26 165 41 60
20200609_ARDS_1921-02 5 71 106 3 1 8 ... 11 60 28 189 21 39
20200609_ARDS_1921-03 6 73 89 3 1 10 ... 12 92 26 166 36 63
20200609_ARDS_1921-04 7 80 126 2 6 16 ... 20 55 36 191 15 74
20200609_ARDS_1921-05 2 49 73 2 2 24 ... 17 39 21 106 11 48
[5 rows x 17 columns]
>>> counts = pd.read_parquet("https://wcm.box.com/shared/static/osprb3urgw1ejbdg7a3w49lrv7ympggx.pq")
>>> counts.head()
cluster 01 - Smooth muscle cells (AlphaSMA+) ... 999 - ?()
roi ...
20200609_ARDS_1921-01 60 ... 0
20200609_ARDS_1921-02 39 ... 0
20200609_ARDS_1921-03 63 ... 0
20200609_ARDS_1921-04 74 ... 0
20200609_ARDS_1921-05 48 ... 0
[5 rows x 50 columns]
Ah almost forgot: these data were already available in the h5ad
file provided in the Zenodo repository: https://zenodo.org/record/4139443/files/results/covid-imc.h5ad?download=1
Where the obs
dataframe has the "metacluster_label" and "cluster_label" columns.
>>> import scanpy as sc
>>> uri = "https://zenodo.org/record/4139443/files/results/covid-imc.h5ad?download=1"
>>> ann = sc.read("covid-imc.h5ad", backup_url=uri)
>>> ann.obs.head()
roi sample disease phenotypes acquisition_id acquisition_date obj_id cluster_1.0 cluster_label metacluster_label
0 20200609_ARDS_1921-01 20200609_ARDS_1921 ARDS ARDS ARDS_1921 7.305364 2 32 32 - Proliferating cells (Ki67+, MPOdim, Histo... Proliferating cells
1 20200609_ARDS_1921-01 20200609_ARDS_1921 ARDS ARDS ARDS_1921 7.305364 3 6 06 - Fibroblasts (CollagenTypeI+) Fibroblasts
2 20200609_ARDS_1921-01 20200609_ARDS_1921 ARDS ARDS ARDS_1921 7.305364 5 21 21 - Fibroblasts (CollagenTypeI+, CD56+, pSTAT... Fibroblasts
3 20200609_ARDS_1921-01 20200609_ARDS_1921 ARDS ARDS ARDS_1921 7.305364 6 5 05 - Endothelial cells (CD31+) Endothelial cells
4 20200609_ARDS_1921-01 20200609_ARDS_1921 ARDS ARDS ARDS_1921 7.305364 7 21 21 - Fibroblasts (CollagenTypeI+, CD56+, pSTAT... Fibroblasts
Hi Andre,
Thank you for the quick response. You have great information that can help loads of researchers regarding lung diseases. And you helped me for sure!
Thanks a million,
Best regards Bruna
Hi Andre,
Great work you have done with the COVID-19 lung tissue samples. I was wondering, do you have the dataset only with the percentage and number of cells for the meta-clusters from the tissue samples for each disease group? Because on your paper, I saw the graphics of the Extended Data 6-7 and I wasn't able to get the precise values. It would be great to have those values. Can you provide them, or explain how to extract the values from your dataset?
Thanks in advance,
Regards Bruna