Single-Cell-Genomics-Group-CNAG-CRG / Tumor-Immune-Cell-Atlas

Code repository for the Tumor Immune Cell Atlas (TICA) project
59 stars 12 forks source link

tumor cells #6

Closed igordot closed 2 years ago

igordot commented 3 years ago

The dataset posted on Zenodo only contains the immune cells. Are the tumor cells available somewhere?

I checked the "Integration" directory where it says the input datasets (individual projects) are not included in this repository. Is there a separate repository for them or at least the code to regenerate them?

PaulaNietoG commented 3 years ago

Hi and thanks for reaching out! At the moment we do not have a repository for the original datasets (tumor + immune cells). Since all of our data comes from publicly available datasets, you could go to the original papers and download the raw data from there. Hope this helps!

igordot commented 3 years ago

Thanks for the quick response! Yes, all the datasets are publicly available, but that is not necessarily the case for annotations, as you mention in the manuscript. It would be useful to have a harmonized annotation for all the projects. Obviously, it would be challenging to label all the non-immune populations, but even just a simple immune/non-immune classification would be helpful. Sometimes the separation is not clear.

PaulaNietoG commented 3 years ago

I agree, this is indeed a challenge. One thing that you could do that has been working great for us to separate immune and non-immune cells is using the average expression of CD45/PRPRC gene in the clusters. Given the nature of single-cell data (i.e. dropout events) using only one gene to classify cells won't work. However, cells will cluster according to their phenotype (tumor/immune/etc.), thus, you can compute the average expression of the aforementioned gene after clustering all your cells (immune and non-immune) and deciding on a threshold to consider each of the clusters immune or non-immune. And then proceed with the analysis you want to perform. Hope this can help you!

igordot commented 3 years ago

you can compute the average expression of the aforementioned gene after clustering all your cells (immune and non-immune) and deciding on a threshold to consider each of the clusters immune or non-immune

Do you have any suggestions on the threshold? Sometimes there is not an obvious separation between positive and negative clusters.

PaulaNietoG commented 3 years ago

I don't have a clear suggestion, but I would say that if you don't mind getting some cancer cells, you can clean up after clustering post-integration, which will probably be clearer (and faster) than having to do this dataset by dataset.