CityUHK-CompBio / DeepCC

DeepCC: a novel deep learning-based framework for cancer molecular subtype classification
https://CityUHK-CompBio.github.io/DeepCC/
MIT License
20 stars 16 forks source link

Recommendations for analyisis (Microarray as training with RNAseq to classify) #11

Closed ahwanpandey closed 3 years ago

ahwanpandey commented 3 years ago

Hello,

Thanks for the tool deepCC. I have a few questions about recommended workflows:

1. I have a High Grade Serous Ovarian Cancer microarray dataset with about 230 samples classified into 4 molecular subtypes to be used as a training dataset.

2. I have 131 RNAseq samples run on various different library preps (some stranded, some unstranded) which I need to classify.

Thanks for your input! Ahwan

zero19970 commented 3 years ago

Hi,

Thanks for using our DeepCC tools. Following are some replys for your questions:

  1. You can use the High Grade Serous Ovarian Cancer microarray dataset as training dataset.
  2. You can apply log2(TPM+1e-6) to the gene expression profiles (eps) before getFunctionalSpectra without any other filtering procession. After getFunctionalSpectra, the eps was transformed from original dimension to the sets number of MSigDBv7 (22, 596 gene sets) for each patient sample.
  3. I think it's fine to classify the 131 RNAseq as long as them are the same cancer types with the training dataset.
  4. Our getFunctionalSpectra can deal with dataset with batch effect. So you neet not to do additional batch correct.
  5. LogChange with log2(TPM + 1) and log2(TMM) should make no much difference. Just keep training dataset and test dataset with the sample logChange. The varied results maybe caused by lack of enough training samples. Also, the classification results are related to how good the molecular subtype of the training dataset.

Best wish! LeeCH

ahwanpandey commented 3 years ago

Thank you for your response!