BioinformaticsFMRP / TCGAWorkflow

TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages
42 stars 20 forks source link

GAIA aberrant region description #6

Open PubuduSaneth opened 6 years ago

PubuduSaneth commented 6 years ago

I followed the TCGAWorkflow to run GAIA using TCGA Malignant melanoma (SKCM) level 3 segment data. According to GAIA load_cnv documentation, estimated copy number for segmented regions (kind of aberrations) are 0, 1 and 2 for losses, LOHs and gains. However, in TCGAWorkflow section "Identification of recurrent CNV in cancer", cnvMatrix contains 0s for losses and 1s for gains.

# Add label (0 for loss, 1 for gain)
cnvMatrix <- cbind(cnvMatrix,Label=NA)
cnvMatrix[cnvMatrix[,"Segment_Mean"] < -0.3,"Label"] <- 0
cnvMatrix[cnvMatrix[,"Segment_Mean"] > 0.3,"Label"] <- 1
cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),]

It would be extremely helpful if you can clarify the reason to deviate from GAIA documentation or let me know whether I have misunderstood the TCGAWorkflow.

tiagochst commented 6 years ago

Sorry for the delay. This section was entirely written by Fulvio. Here is his answer:

“In the pipeline we described for the identification of recurrent CNV in cancer, we considered only two aberrations: gain, defined as log2(copy-number/ 2)>0.3, and loss, defined as log2(copy-number/ 2)<-0.3. So, according to GAIA load_cnv documentation (it must be an integer in the range 0..(K-1) where K is the number of the considered aberrations), in the passed segmentation_matrix Copy Number can be 0 (loss) or 1 (gain).”