abelson-lab / scATOMIC

Pan-Cancer Single Cell Classifier
MIT License
57 stars 5 forks source link

Problems about the feature selection #15

Closed Jonyyqn closed 10 months ago

Jonyyqn commented 10 months ago

Excellent work. I have a question about scAtomic's ability to distinguish tumor cells from normal tissue cells. scAtomic used the DEGs of different tumor tissues and matched normal tissues from OncoDB as the feature input for hierarchical clustering when performing this task. I am curious, considering that the DEGs of different tumor types are different, scAtomic will select the DEG corresponding to tumor tissue and normal tissue according to the predicted tumor type when selecting features, or will it use a fixed DEG geneset (such as taking an intersection of different tumor type DEGs) for all classification tasks?

inofechm commented 10 months ago

Thank you for your interest and raising an important point. Indeed scATOMIC first generally predicts which tumour type it is and uses DEGs from OncoDB that are specific to the respective tumour. Additionally we have an argument to create_summary_matrix called pan_cancer where if the user sets it to TRUE, scATOMIC will instead use a pan cancer list which combines all the DEGs from every tissue in OncoDB. In my experience I prefer to leave this set to FALSE. Let me know if this clarifies your question, Ido

Jonyyqn commented 10 months ago

Thank you for your interest and raising an important point. Indeed scATOMIC first generally predicts which tumour type it is and uses DEGs from OncoDB that are specific to the respective tumour. Additionally we have an argument to create_summary_matrix called pan_cancer where if the user sets it to TRUE, scATOMIC will instead use a pan cancer list which combines all the DEGs from every tissue in OncoDB. In my experience I prefer to leave this set to FALSE. Let me know if this clarifies your question, Ido

Thanks for your very detailed answer! This helped me better understand how the algorithm works. Looking forward to your further improvement in the future