NKI-CCB / DISCOVER

DISCOVER co-occurrence and mutual exclusivity analysis for cancer genomics data
Apache License 2.0
27 stars 6 forks source link

Can DISCOVER be used with data from targeted sequencing? #26

Open jud-b opened 4 months ago

jud-b commented 4 months ago

Hi,

I have successfully used DISCOVER with data from whole-exome sequencing. I am wondering whether I can use it with data from targeted sequencing data generated by MSK IMPACT or DFCI OncoPanel. Are there enough mutation events from only 300-500 genes to estimate the background mutation rate? Are there specific assumptions that are not met when one is using targeted sequencing data? Your help would be much appreciated.

Thanks.

scanisius commented 3 months ago

Using DISCOVER with gene panels of a few hundred genes works very well. In the DISCOVER paper, we used whole-exome data with the assumption that the estimation of the background model benefits from having mutation data for as many genes as possible. Since then, we have also applied DISCOVER to gene panel data. We have observed that for panels of a few hundred genes the results obtained with DISCOVER are very similar. You should probably be more careful with very small gene panels though.

To illustrate the concept, have a look at the R code below, which subsets the included breast cancer mutation data to the MSK-IMPACT panel genes and compares the results with those of the whole-exome analysis.

library(discover)

data(BRCA.mut)

# Download MSK-IMPACT panel genes
panel_info <- readLines(url("https://media.githubusercontent.com/media/cBioPortal/datahub/master/reference_data/gene_panels/data_gene_panel_impact505.txt"))
msk_impact_genes <- unlist(strsplit(unlist(strsplit(grep("^gene_list:", panel_info, value = TRUE), " "))[2], "\t"))

# Fit background model for full and panel mutation data
msk_impact_genes <- intersect(rownames(BRCA.mut), msk_impact_genes)

events_all_genes <- discover.matrix(BRCA.mut)
events_all_genes <- events_all_genes[msk_impact_genes, ]

events_msk_impact <- discover.matrix(BRCA.mut[msk_impact_genes, ])

# Perform DISCOVER test for genes with more than 25 mutations
subset <- rowSums(events_msk_impact$events) > 25

result_all_genes <- pairwise.discover.test(events_all_genes[subset, ])
result_msk_impact <- pairwise.discover.test(events_msk_impact[subset, ])

# Compare the resulting P values
mask <- lower.tri(result_all_genes$p.values)
p_all_genes <- result_all_genes$p.values[mask]
p_msk_impact <- result_msk_impact$p.values[mask]

plot(-log10(p_all_genes), -log10(p_msk_impact))

image