abelson-lab / scATOMIC

Pan-Cancer Single Cell Classifier
MIT License
57 stars 5 forks source link

Normal tissue cell in known cancer type #17

Closed CathyXD closed 4 months ago

CathyXD commented 9 months ago

Hi, team Thank you for developing such helpful tool! I am dealing with prostate cancer cells mixed with some normal cells, and I get some confusion with the results.

First, some prostate cancer cells we are sure about are annotated as normal tissue cells in the final prediction and in layer 6 they are labeled as other cancer type mostly. Does this have biological implication that the biology of those misannotated cells are different from those prostate cancer cells?

Also, as the Normal Tissue Cells are likely to be misannotataed cancer cell, for other possible normal cells labeled as normal tissue cells but labeled as other cancer type in layer 6, will those cell are more likely to be normal cell classification not specify or possible tumor cells? For example, we have cells that labeled as Hepatocyptes by SingleR but labeled as Liver cancer cells in our data. In general, what's your comments on presense of other cancer type cancers in a known cancer samples?

Another question is about the Non Blood Cell. We have a large group cells full into this classification. I wondering whether this classification is more biaes to the normal cells without specification like smooth muscle/myocytes in our data?

For the use.CNV, I tried both setting and find no difference in the summary matrix between using it and not. Does this mean the result is very solid predicted or it indicates the something wrong when running it.

results_cell <- create_summary_matrix(prediction_list = cell_predictions, use_CNVs = F, modify_results = T, mc.cores = 6, raw_counts = sparse_matrix, min_prop = 0.5, known_cancer_type = "PRAD")

results_CNV <- try(create_summary_matrix(prediction_list = cell_predictions, use_CNVs = T, modify_results = T, mc.cores = 6, raw_counts = sparse_matrix, min_prop = 0.5, known_cancer_type = "PRAD"))
inofechm commented 9 months ago

Hi Cathy, Thanks for you interest in our work! I wanted to just clarify a couple things to help figure out what is going on: Are you running scATOMIC on each patient sample individually? There can be issues with how scATOMIC annotates normal tissue cells if tumours from multiple patients are present. As such we generally split up count matrices for each sample and run the pipeline individually. Another issue that can occur is that sometimes in rare cancer subtypes they were not well represented in scATOMIC's trained model so it can mistake them for another type.

Does this have biological implication that the biology of those misannotated cells are different from those prostate cancer cells?

This could potentially indicate different biology, but it likely has to do with unrepresented subtypes or issues with having multiple patients in the sample.

will those cell are more likely to be normal cell classification not specify or possible tumor cells

Im not sure I quite understand the question, but the assumption is that these are normal tissue cells not resembling a cell type that was used in the training of scATOMIC (not a malignant cell, immune cell or fibroblast/endothelial cell).

For example, we have cells that labeled as Hepatocyptes by SingleR but labeled as Liver cancer cells in our data. In general, what's your comments on presense of other cancer type cancers in a known cancer samples?

scATOMIC does not contain all cell types of the body in its model and as such when there is a normal tissue specific cell type it can just randomly be classified as some other cancer type, this is random and likely does not indicate any relevant biology. In these cases we do not consider what the layer_6 annotation is and just assume that things that are given an scATOMIC_pred of normal tissue cells are indeed unknown tissue specific non-malignant cells. One can then try to understand what these are through manual annotation by looking at known marker genes.

Another question is about the Non Blood Cell. We have a large group cells full into this classification. I wondering whether this classification is more biaes to the normal cells without specification like smooth muscle/myocytes in our data?

Yes these cells are usually normal tissue cells or low quality cells that did not have strong scores in the model prediction. it is for sure biased to normal cells without a specification in scATOMIC. If you force the classification with the confidence_cutoff = F argument in run_scATOMIC and create_summary_matrix, you might get an idea of what lineage they are coming from, likewise if you see them in the same cluster as well annotated cells in the UMAP you can support a hypothesis for what they may be.

the use.CNV, I tried both setting and find no difference in the summary matrix between using it

To clarify all the use.CNV argument does is add a column to the results matrix indicating whether the cell is predicted to be aneuploid or diploid. In general you can then modify the results manually by only considering cells annotated as aneuploid and cancer cells as confident calls due to concordant predictions and likewise normal tissue cells and diploid as normal. This is a good approach to filter any cells that we may not be sure scATOMIC predicted malignant status correctly.

Let me know if this helps and if you have any other questions.