BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
298 stars 112 forks source link

Survival analyses for specific set of genes in specific samples #218

Open abhisheksinghnl opened 6 years ago

abhisheksinghnl commented 6 years ago

Hi,

I am trying to compute survival plots for a specific set of genes expressed in a specific samples in TCGA.

What I am doing is

listSamples <- c("TCGA-DI-A1BU-01A" , "TCGA-D1-A0ZZ-01A" , "TCGA-2E-A9G8-01A" , "TCGA-AJ-A3EJ-01A" , "TCGA-AP-A0LH-01A" , "TCGA-D1-A1NW-01A" , "TCGA-EY-A1GM-01A" , "TCGA-A5-A7WK-01A" , "TCGA-QF-A5YS-01A" , "TCGA-AP-A5FX-01A" , "TCGA-AX-A1CR-01A" , "TCGA-FI-A2D2-01A" , "TCGA-EO-A3AZ-01" , "TCGA-BK-A139-02A" , "TCGA-EY-A1GV-01A" , "TCGA-D1-A2G0-01A" , "TCGA-A5-A1OF-01A" , "TCGA-EY-A2OO-01A" , "TCGA-AJ-A3NG-01" , "TCGA-FI-A2CY-01A" , "TCGA-E6-A2P8-01A" , "TCGA-BK-A0CA-01A" , "TCGA-AJ-A3BG-01A" , "TCGA-EY-A210-01A")

query <- GDCquery(project = "TCGA-UCEC",
                   data.category = "Gene expression",
                   data.type = "Gene expression quantification",
                   platform = "Illumina HiSeq", 
                   file.type  = "normalized_results",
                   experimental.strategy = "RNA-Seq",
                   barcode = listSamples,
                   legacy=T)

GDCdownload(query, directory = "GDCdata", method="client")

data <- GDCprepare(query)

datatable(as.data.frame(colData(data)), 
              options = list(scrollX = TRUE, keys = TRUE, pageLength = 5), 
              rownames = FALSE)
library(SummarizedExperiment)
UCECMatrix=assays(data)[["raw_count"]]
data_CorOutliers <- TCGAanalyze_Preprocessing(data)

Now the example for survival analysis, does not address how I can compute survival analyses for x,y,z,a,b,c genes, say, TP53, RUNX1, TGM2, LGR5 in specific set of samples. I am sure that people have encountered this situation. Could you please let me know, how to tackle this issue.

Many thanks in advance.

tiagochst commented 6 years ago

You have to stratify samples into groups (high expression for gene A and low expression for gene A, or samples w/ gene A mutated vs non-mutated etc...). Then perform survival on those groups.

I have a similar code: https://github.com/tiagochst/ELMER/blob/master/R/TFsurvival.plot.R But you will probably have to make some changes. @torongs82 You had a function that did that analysis, no ?