ElementoLab / covid-imc

COVID19 profiling of lung tissue with imaging mass cytometry (IMC)
11 stars 4 forks source link

How to extract the values for the percentage and number of cells for the meta-clusters? #2

Closed BFonseca11 closed 4 years ago

BFonseca11 commented 4 years ago

Hi Andre,

Great work you have done with the COVID-19 lung tissue samples. I was wondering, do you have the dataset only with the percentage and number of cells for the meta-clusters from the tissue samples for each disease group? Because on your paper, I saw the graphics of the Extended Data 6-7 and I wasn't able to get the precise values. It would be great to have those values. Can you provide them, or explain how to extract the values from your dataset?

Thanks in advance,

Regards Bruna

afrendeiro commented 4 years ago

Hi! Thanks for the interest and for reaching out. I did intent to make more digested outputs available, but reached the size limit on Zenodo. I agree that cell counts are pretty handy. You can find them in the following two files: https://wcm.box.com/shared/static/nnqhgim66g49d8xa5ix5511og6z3gil9.pq https://wcm.box.com/shared/static/osprb3urgw1ejbdg7a3w49lrv7ympggx.pq

The first one has absolute cell counts for each image (rows) in each meta-cluster (columns), and the second counts for the original clusters, where the last column are putative cells without assigned cluster identity. Percentages can be calculated from the total number of cells in each row of the second matrix.

Best wishes, Andre

>>> import pandas as pd
>>> aggcounts = pd.read_parquet("https://wcm.box.com/shared/static/nnqhgim66g49d8xa5ix5511og6z3gil9.pq")
>>> aggcounts.head()
                       B cells  CD4 T-cells  CD8 T-cells  Club cells  Dendritic cells  Dying cells  ...  Mesenchymal cells  Monocytes  NK-cells  Neutrophils  Proliferating cells  Smooth muscle cells
roi                                                                                                 ...                                                                                               
20200609_ARDS_1921-01        6           62           97           2                4           10  ...                  7         55        26          165                   41                   60
20200609_ARDS_1921-02        5           71          106           3                1            8  ...                 11         60        28          189                   21                   39
20200609_ARDS_1921-03        6           73           89           3                1           10  ...                 12         92        26          166                   36                   63
20200609_ARDS_1921-04        7           80          126           2                6           16  ...                 20         55        36          191                   15                   74
20200609_ARDS_1921-05        2           49           73           2                2           24  ...                 17         39        21          106                   11                   48
[5 rows x 17 columns]
>>> counts = pd.read_parquet("https://wcm.box.com/shared/static/osprb3urgw1ejbdg7a3w49lrv7ympggx.pq")
>>> counts.head()
cluster                01 - Smooth muscle cells (AlphaSMA+)  ...  999 - ?()
roi                                                          ...           
20200609_ARDS_1921-01                                    60  ...          0
20200609_ARDS_1921-02                                    39  ...          0
20200609_ARDS_1921-03                                    63  ...          0
20200609_ARDS_1921-04                                    74  ...          0
20200609_ARDS_1921-05                                    48  ...          0
[5 rows x 50 columns]
afrendeiro commented 4 years ago

Ah almost forgot: these data were already available in the h5ad file provided in the Zenodo repository: https://zenodo.org/record/4139443/files/results/covid-imc.h5ad?download=1

Where the obs dataframe has the "metacluster_label" and "cluster_label" columns.

>>> import scanpy as sc
>>> uri = "https://zenodo.org/record/4139443/files/results/covid-imc.h5ad?download=1"
>>> ann = sc.read("covid-imc.h5ad", backup_url=uri)
>>> ann.obs.head()
                     roi              sample disease phenotypes acquisition_id  acquisition_date  obj_id cluster_1.0                                      cluster_label    metacluster_label
0  20200609_ARDS_1921-01  20200609_ARDS_1921    ARDS       ARDS      ARDS_1921          7.305364       2          32  32 - Proliferating cells (Ki67+, MPOdim, Histo...  Proliferating cells
1  20200609_ARDS_1921-01  20200609_ARDS_1921    ARDS       ARDS      ARDS_1921          7.305364       3           6                  06 - Fibroblasts (CollagenTypeI+)          Fibroblasts
2  20200609_ARDS_1921-01  20200609_ARDS_1921    ARDS       ARDS      ARDS_1921          7.305364       5          21  21 - Fibroblasts (CollagenTypeI+, CD56+, pSTAT...          Fibroblasts
3  20200609_ARDS_1921-01  20200609_ARDS_1921    ARDS       ARDS      ARDS_1921          7.305364       6           5                     05 - Endothelial cells (CD31+)    Endothelial cells
4  20200609_ARDS_1921-01  20200609_ARDS_1921    ARDS       ARDS      ARDS_1921          7.305364       7          21  21 - Fibroblasts (CollagenTypeI+, CD56+, pSTAT...          Fibroblasts
BFonseca11 commented 4 years ago

Hi Andre,

Thank you for the quick response. You have great information that can help loads of researchers regarding lung diseases. And you helped me for sure!

Thanks a million,

Best regards Bruna