NKI-CCB / DISCOVER

DISCOVER co-occurrence and mutual exclusivity analysis for cancer genomics data
Apache License 2.0
27 stars 6 forks source link

Generating mutation matrix #7

Closed lramsa1 closed 6 years ago

lramsa1 commented 6 years ago

This is not so much as issue as a request. I've been trying to understand your tool using the R introduction page (http://ccb.nki.nl/software/discover/doc/r/discover-intro.html), and it's not totally clear to me how you generated the BRCA.mut matrix. I assume 1 denotes a mutated gene for each tumor, but how did you define mutated or not? (For example would a gene with a synonymous mutation be labeled as mutated?) Are there any mutation matrices available for other TCGA cancer types? Thanks

scanisius commented 6 years ago

Indeed, in the BRCA.mut matrix 1 means mutated and 0 means wildtype. For this example matrix, we did not perform any particular filtering, so a synonymous mutation would also be considered mutated. Of course, for many real analyses it may be useful to do some filtering upfront. We consider this outside of the scope of the DISCOVER package, but once filtered you can use any mutation matrix with our test.

The best places to look for TCGA mutation data are FireBrowse or GDC. From there, you can download mutations in MAF format, which you could turn into a mutation matrix as follows.

maf <- read.delim("filename.maf", as.is=TRUE)
maf <- maf[maf$Variant_Classification != "Silent", ]   # This is where you might filter mutations
mut <- pmin(table(maf$Hugo_Symbol, maf$Tumor_Sample_Barcode), 1)
scanisius commented 6 years ago

This issue has been inactive for a few months, so I am closing it.