PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
450 stars 222 forks source link

Oncoplot MutSig error: object 'ms.smg' not found #292

Closed jeff-meng closed 5 years ago

jeff-meng commented 5 years ago

Hi! I tried to load the "sig_genes.txt" file from MutSig with mutsig <- system.file("extdata", "../MutSig2CV/sig_genes.txt", package = "maftools") but then mutsig returns "":

> mutsig
[1] ""

and oncoplot(maf = comb_maf, colors = col, mutsig = mutsig, mutsigQval = 0.1, ... returns

Following genes from provided gene list are missing from MAF: [1] "IDH1" Ignoring them. Error in oncoplot(maf = comb_maf, colors = col, mutsig = mutsig, mutsigQval = 0.1, : object 'ms.smg' not found

I tried to debug a bit by running ms = data.table::fread(input = mutsig, sep = '\t', stringsAsFactors = FALSE, header = TRUE) but I got

Error in data.table::fread(input = mutsig, sep = "\t", stringsAsFactors = FALSE, : Input is empty or only contains BOM or terminal control characters

Then I tried to load the "sig_genes.txt" file directly into Oncoplot using oncoplot(maf = comb_maf, colors = col, mutsig = "../MutSig2CV/sig_genes.txt", mutsigQval = 0.1,... but I got the same error:

Following genes from provided gene list are missing from MAF: [1] "IDH1" Ignoring them. Error in oncoplot(maf = comb_maf, colors = col, mutsig = "../MutSig2CV/sig_genes.txt", : object 'ms.smg' not found

Any idea what's wrong and how to resolve this? Thank you!

Here's my sessionInfo:

R version 3.5.2 (2018-12-20) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] maftools_1.8.10 usethis_1.5.0 devtools_2.0.2 RColorBrewer_1.1-2 survminer_0.4.3
[6] ggpubr_0.2 magrittr_1.5 ggplot2_3.1.1 survival_2.44-1.1 Biobase_2.42.0
[11] BiocGenerics_0.28.0

loaded via a namespace (and not attached): [1] nlme_3.1-139 bitops_1.0-6 matrixStats_0.54.0
[4] fs_1.2.7 cmprsk_2.2-7 doParallel_1.0.14
[7] rprojroot_1.3-2 GenomeInfoDb_1.18.2 tools_3.5.2
[10] backports_1.1.4 R6_2.4.0 lazyeval_0.2.2
[13] colorspace_1.4-1 GetoptLong_0.1.7 withr_2.1.2
[16] prettyunits_1.0.2 processx_3.3.0 tidyselect_0.2.5
[19] gridExtra_2.3 curl_3.3 compiler_3.5.2
[22] cli_1.1.0 desc_1.2.0 DelayedArray_0.8.0
[25] pkgmaker_0.27 rtracklayer_1.42.2 labeling_0.3
[28] scales_1.0.0 survMisc_0.5.5 callr_3.2.0
[31] NMF_0.21.0 stringr_1.4.0 digest_0.6.18
[34] Rsamtools_1.34.1 XVector_0.22.0 pkgconfig_2.0.2
[37] sessioninfo_1.1.1 bibtex_0.4.2 BSgenome_1.50.0
[40] rlang_0.3.4 GlobalOptions_0.1.0 rstudioapi_0.10
[43] shape_1.4.4 generics_0.0.2 zoo_1.8-5
[46] mclust_5.4.3 BiocParallel_1.16.6 dplyr_0.8.0.1
[49] RCurl_1.95-4.12 GenomeInfoDbData_1.2.0 wordcloud_2.6
[52] Matrix_1.2-17 Rcpp_1.0.1 munsell_0.5.0
[55] S4Vectors_0.20.1 stringi_1.4.3 SummarizedExperiment_1.12.0 [58] zlibbioc_1.28.0 pkgbuild_1.0.3 plyr_1.8.4
[61] grid_3.5.2 ggrepel_0.8.0 crayon_1.3.4
[64] lattice_0.20-38 Biostrings_2.50.2 cowplot_0.9.4
[67] splines_3.5.2 circlize_0.4.6 ps_1.3.0
[70] knitr_1.22 ComplexHeatmap_1.20.0 pillar_1.3.1
[73] GenomicRanges_1.34.0 rjson_0.2.20 rngtools_1.3.1
[76] pkgload_1.0.2 reshape2_1.4.3 codetools_0.2-16
[79] stats4_3.5.2 XML_3.98-1.19 glue_1.3.1
[82] BiocManager_1.30.4 remotes_2.0.4 data.table_1.12.2
[85] foreach_1.4.4 gtable_0.3.0 purrr_0.3.2
[88] tidyr_0.8.3 km.ci_0.5-2 assertthat_0.2.1
[91] xfun_0.6 gridBase_0.4-7 xtable_1.8-3
[94] broom_0.5.2 tibble_2.1.1 iterators_1.0.10
[97] memoise_1.1.0 GenomicAlignments_1.18.1 registry_0.5-1
[100] IRanges_2.16.0 KMsurv_0.1-5 cluster_2.0.8

ShixiangWang commented 5 years ago

mutsig <- system.file("extdata", "../MutSig2CV/sig_genes.txt", package = "maftools")

This code did not find your sig_gene.txt, where is it?

jeff-meng commented 5 years ago

@ShixiangWang, thanks for your reply. I'm 100% certain that the filepath "../MutSig2CV/sig_genes.txt" is correct.

I later copied the file to the working directory and used mutsig <- system.file("extdata", "sig_genes.txt", package = "maftools") and I got the same result.

mutsig [1] ""

ShixiangWang commented 5 years ago

Could you use

mutsig <- "../MutSig2CV/sig_genes.txt" 

There is not sig_genes.txt in maftools.

jeff-meng commented 5 years ago

@ShixiangWang When I use mutsig <- "../MutSig2CV/sig_genes.txt", ms = data.table::fread(input = mutsig, sep = '\t', stringsAsFactors = FALSE, header = TRUE)works and I get

ms rank gene longname codelen nnei nncd nsil 1: 1 TP53 tumor protein p53 1890 12 0 0 2: 2 BRAF v-raf murine sarcoma viral oncogene homolog B1 2371 5 0 3 3: 3 KCNQ2 potassium voltage-gated channel, KQT-like subfamily, member 2 2747 39 0 0 4: 4 NRAS neuroblastoma RAS viral (v-ras) oncogene homolog 941 4 2 0 5: 5 ZKSCAN1 zinc finger with KRAB and SCAN domains 1 1712 41 0 1

   nmis nstp nspl nind nnon npat nsite          pCV   pCL     pFN            p            q
1:   14    3    1    1   19   17    16 7.507203e-13 4e-05 0.02799 3.330669e-16 6.093792e-12
2:   42    0    0    1   43   35     9 1.130012e-04 1e-05 0.00022 2.440942e-08 2.232974e-04
3:    7    0    2    0    9    9     9 2.956467e-08 1e+00 0.75000 5.421180e-07 3.306197e-03
4:   10    0    0    0   10    9     3 5.656094e-03 1e-05 0.17022 1.000447e-06 4.576044e-03
5:    1    0    0    3    4    4     2 5.679787e-03 1e-04 0.98069 8.736183e-06 3.196744e-02

However, oncoplot(maf = comb_maf, colors = col, mutsig = mutsig, mutsigQval = 0.1, ... still gives the same error:

Following genes from provided gene list are missing from MAF: [1] "IDH1" Ignoring them. Error in oncoplot(maf = comb_maf, colors = col, mutsig = mutsig, mutsigQval = 0.1, : object 'ms.smg' not found

ShixiangWang commented 5 years ago

Please check the following code.

mutsigQval = 0.1
ms = data.table::fread(input = mutsig, sep = '\t', stringsAsFactors = FALSE, header = TRUE)

ms$q = as.numeric(gsub(pattern = "^<", replacement = "", x = as.character(ms$q)))
mach.epsi = .Machine$double.eps
ms$q = ifelse(test = ms$q == 0, yes = mach.epsi, no = ms$q)
ms[,FDR := -log10(as.numeric(as.character(q)))]
ms.smg = ms[q < as.numeric(as.character(mutsigQval))]
jeff-meng commented 5 years ago

@ShixiangWang All of the code you gave me worked, but the oncoplot function still gave the same error. Then I quit RStudio and started it back up again. And oncoplot produced a plot. However, the q-value bars are thicker than the bars for mutations and misaligned with the rest of the plot.

Here's the right half of the resulting oncoplot:

image

ShixiangWang commented 5 years ago

Could you reproduce this plot problem with sample data provided by maftools?

library(maftools)
#path to TCGA LAML MAF file
laml.maf = system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools')
#clinical information containing survival information and histology. This is optional
laml.clin = system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools')

laml = read.maf(maf = laml.maf,
                clinicalData = laml.clin,
                verbose = FALSE)

#MutSig results
laml.mutsig = system.file("extdata", "LAML_sig_genes.txt.gz", package = "maftools")

oncoplot(
  maf = laml,
  mutsig = laml.mutsig,
  mutsigQval = 0.01,
)

All code come from https://bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/oncoplots.html.

PoisonAlien commented 5 years ago

Thank you @ShixiangWang for debugging. @jeff-meng could you please install the tool from GitHub and try again ?

jeff-meng commented 5 years ago

Thanks @ShixiangWang and @PoisonAlien! The sample code involving LAML works perfectly, but my code still makes MutSig q-value bars that are too wide. I tried commenting out the oncoplot "gene" argument that specifies my selection of genes to display, and the resulting plot looked fine. So I think the problem is that the q-value plotting is not compatible with gene selection. I would really appreciate it if you can fix this!

oncoplot(maf = comb_maf, colors = col, mutsig = mutsig, mutsigQval = 0.1, 
         #genes=cm_genes,
         keepGeneOrder = T, GeneOrderSort = F, #don't sort samples by gene mut
         removeNonMutated=F,
         clinicalFeatures = c("Response", "Sample_source"), sortByAnnotation = TRUE, annotationOrder=c("CR","PR","SD","PD","SPP"),
         annotationColor = annot_colors)

image

ShixiangWang commented 5 years ago

@jeff-meng May you provide regarding data files and send to @PoisonAlien so he can fix this bug?

jeff-meng commented 5 years ago

Thanks, @ShixiangWang and @PoisonAlien. I would provide my data, but more conveniently, I reproduced the problem with the demo LAML data:

laml.maf = system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools')
laml.clin = system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools')
laml = read.maf(maf = laml.maf,
                clinicalData = laml.clin,
                verbose = FALSE)
laml.mutsig = system.file("extdata", "LAML_sig_genes.txt.gz", package = "maftools")
oncoplot(
  maf = laml,
  mutsig = laml.mutsig,
  mutsigQval = 0.01,
  genes=c("BRAF", "NRAS", "HRAS", "KRAS","NF1", "TP53", "CDKN2A")
) 

and I get: image

PoisonAlien commented 5 years ago

Hi, Thing is if you're passing significant gene list, it will plot all the genes below threshold - hence genes or top arguments wont work. If you really want to restrict then you could use exprsTbl argument. Something like below,

> laml.mutsig = system.file("extdata", "LAML_sig_genes.txt.gz", package = "maftools")
> ms_res = data.table::fread(input = laml.mutsig)
> ms_res = ms_res[,.(gene, q)]
> ms_res[, q := -log10(q)]
> head(ms_res)
gene        q
1:   FLT3 12.64176
2: DNMT3A 12.64176
3:   NPM1 12.64176
4:   IDH2 12.64176
5:   IDH1 12.64176
6:   TET2 12.64176
> oncoplot(maf = laml, exprsTbl = ms_res)
#You can include `top` or `genes` argument

Let me know if this helps. You may have to install from GiHub again for the fixes.

jeff-meng commented 5 years ago

Thanks, @PoisonAlien !

Now I see that it was silly of my to provide a gene list of interested that contained gene with MutSig q > 0.1 and at the same time specify a mutsigQval cutoff of 0.1. I downloaded the latest Maftools and the trick with exprsTbl works! I just needed to add a line to your code above to rename "gene" as "genes": colnames(ms_res)[1] = "genes"

Still, I wonder if you could make the oncoplot argument "mutsig" work with gene selection by "genes". With the "genes" argument present, even when I set the mutsigQval threshold to 1, oncoplot still displays only genes with q < 0.1

PoisonAlien commented 5 years ago

Yes. That was the point of using mutsig option to pre-select the genes based on q-value threshold. But if you still think it will be an additional value, then I will push the changes.