BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
289 stars 110 forks source link

Differential expression by egdeR require raw count.... #56

Open cloudred20 opened 7 years ago

cloudred20 commented 7 years ago

Hey Antonio I'm starting a thread here like you suggested.

I wanted to use TCGAbiolink to study differential expression of miRNA across different cancer type. What I observe in the vignette is that the method used for analysis is "edgeR" but the input file provided is "normalized count". But edgeR works with raw count, isn't it? Pardon me if I'm missing something obvious. I've pasted the code below.


# Downstream analysis using gene expression data  
# TCGA samples from IlluminaHiSeq_RNASeqV2 with type rsem.genes.results
# save(dataBRCA, geneInfo , file = "dataGeneExpression.rda")
library(TCGAbiolinks)

# normalization of genes
dataNorm <- **TCGAanalyze_Normalization**(tabDF = dataBRCA, geneInfo =  geneInfo)

# quantile filter of genes
dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm,
                                  method = "quantile", 
                                  qnt.cut =  0.25)

# selection of normal samples "NT"
samplesNT <- TCGAquery_SampleTypes(barcode = colnames(dataFilt),
                                   typesample = c("NT"))

# selection of tumor samples "TP"
samplesTP <- TCGAquery_SampleTypes(barcode = colnames(dataFilt), 
                                   typesample = c("TP"))

# Diff.expr.analysis (DEA)
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,samplesNT],
                            mat2 = dataFilt[,samplesTP],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")
cloudred20 commented 7 years ago

Before DEA, TCGAanalyze_Normalization is used.

torongs82 commented 7 years ago

Hi @megha20 , thank you for your interest in TCGAbiolinks. You pointed out an interesting downstream analysis dealing with RNAseq's data. Reformulating your question 'Can we use the function TCGAanalyze_DEA (wrapping to edgeR') after TCGAanalyze_Normalization (wrapping to EDA-seq')? Let me answer Yes, I think so. I can suggest you to read the updated manual / vignette from edgeR http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf in particular you can find under points 2.7.4, 2.7.5, 2.7.6 sections related to GC content, Gene length, Model-based normalization, not transformation respectively. The three sections I mentioned mean that you can use edgeR with EDA-seq and in terms of using non-integer counts with edgeR, it is stated in the edgeR manual that edgeR works with RSEM counts (e.g. non interger counts).

In addition I can suggest you to use both GC content and Gene length normalization in this way:


dataNorm <- TCGAanalyze_Normalization(tabDF = dataBRCA,
                                      geneInfo = geneInfo,
                                      method = "gcContent")                
dataNorm2 <- TCGAanalyze_Normalization(tabDF = dataNorm,
                                      geneInfo = geneInfo,
                                      method = "geneLength")  

dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm2,
                                  method = "quantile", 
                                  qnt.cut =  0.25)   

dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,dataSmNT],
                            mat2 = dataFilt[,dataSmTP],
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.01 ,
                            logFC.cut = 1,
                            method = "glmLRT")  

In particular we discussed already about this downstream analysis during one of my TCGA's workshop in DCRC, Copenaghen (Denmark) and I would like also to thank Elena and Thilde for their contribution about it. @ThildeBT and @elenapapaleo if you suggest some improvements feel free to comment or to add your feedback in this issue / thread if you agree with my examples or if I missed something.

For any informations @megha20 you can write me back. Best, Antonio.