abelson-lab / scATOMIC

Pan-Cancer Single Cell Classifier
MIT License
61 stars 4 forks source link

Using scATOMIC on Metastatic Samples #31

Open hkarakurt8742 opened 5 months ago

hkarakurt8742 commented 5 months ago

Hello, Thank you for the amazing tool. It works really good. I have 2 particular questions. I have 2 big datasets, primary lung adenocarcinoma tumor and brain metastasis of lung adenocarcinoma. They both coming from multiple patients. My first question is; I tried to use scATOMIC on primary tumor data using "pan_cancer" parameter as TRUE and FALSE. Results are different on cancer cells. Since lung cancer is one of the main cancer types you used to train scATOMIC, is "pan_cancer" parameter is a better option? The difference is; in many cells one approach classified cells as "Normal Tissue Cells" as the other approach is classified them as "lung". All predictions are confident. Instead of using "pan_cancer", using "known_cancer_type = "lung" is enough? My second question is, as I mentioned the data from brain metastasis probably includes some brain cells. I want to filter them out. To do that I did not use "pan_cancer" or "known_cancer_type" parameter. Should I need to try a different approach for this kind of dataset?

Thank you in advance.

inofechm commented 5 months ago

Hi,

  1. Use pan_cancer = F, pan_cancer is only used when the cancer type is not in the scATOMIC training or if most cells are misclassified as the wrong cancer type. You dont have to use known cancer type = 'lung' if scATOMIC is predicting lung. this will just modify any cancer label that was different to lung, but scATOMIC should be able to do this automatically.
  2. for the brain cells, there are a couple possibilities:

The other option is to also run a CNV inference method and remove cells predicted as diploid and not immune/stromal populations.

Let me know if you have any more questions. Best, Ido

hkarakurt8742 commented 5 months ago

Hi,

  1. Use pan_cancer = F, pan_cancer is only used when the cancer type is not in the scATOMIC training or if most cells are misclassified as the wrong cancer type. You dont have to use known cancer type = 'lung' if scATOMIC is predicting lung. this will just modify any cancer label that was different to lung, but scATOMIC should be able to do this automatically.
  2. for the brain cells, there are a couple possibilities:
  • The brain cells are called as normal_tissue_cell and you can just filter those out.
  • The brain cells are called as oligodendrocytes or glial cells and just filter those out.
  • The brain cells are first called Brain Cancer Cell in layer_6 of the results and are subsequently converted to be lung cancer cells, in this case just filter any cells that have layer_6 = brain cancer cells in the results.

The other option is to also run a CNV inference method and remove cells predicted as diploid and not immune/stromal populations.

Let me know if you have any more questions. Best, Ido

Thank you for your amazing answer. I will try your suggestion as soon as possible. I only have one more question. I classified my primary tumor data using scATOMIC with command:

`tumor_predictions <- run_scATOMIC(as(as.matrix(tumor_seurat[["RNA"]]$counts), "sparseMatrix"), breast_mode = F , mc.cores = 2 , fine_grained_T = F)

gc()

tumor_results <- create_summary_matrix(prediction_list = tumor_predictions, use_CNVs = F, modify_results = T, mc.cores = 1, raw_counts = as(as.matrix(tumor_seurat[["RNA"]]$counts), "sparseMatrix"), min_prop = 0.5 , fine_grained_T = F , pan_cancer = F) `

When I check the results, I see that confidence column for cancer cells are "NA" and cancer cells are labelled in Layer 6 as Lung Cancer Cell while scATOMIC prediction is Gastritic Cancer Cell. What might be the reason or is it an error?

Thank you in advance

inofechm commented 5 months ago

Sometimes if there are a lot of non cancer normal tissue cells that resemble another cancer type like gastric scATOMIC might think these are the majority cancer type.

alternatively they can be an undifferentiated subset of lung cancer cells or a different subtype of lung cancer like squamous that is less represnted in scATOMIC's training data, likely expressing some mucin gene. In this case you should use the known cancer type parameter or just manually change the result.

To be safe I would try to tease this out by running infercnv or copykat and seeing if these are predicted as diploid vs aneuploid.

Gastric doesnt have a confidence column entry because at the time of the benchmarking we couldn't assess gastric cancer performance well and assign what scores should be deemed confident or not.