PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
437 stars 217 forks source link

subsetMaf (and depending functions?) not working on data imported with annovarToMaf #11

Closed sribi closed 7 years ago

sribi commented 7 years ago

Description subsetMaf() behaves weird / throws error when applied on a maf object created using annovarToMaf(). Other functions like oncoplot, rainfallplot etc behave weird, too, probably due to to the fact that Tumor_Sample_Barcode column is populated with gene names.

Example

generate maf object from provided annovar file

var.annovar <- system.file("extdata", "variants.hg19_multianno.txt", package = "maftools")
var.annovar.maf <- annovarToMaf(annovar = var.annovar, Center = 'CSI-NUS', refBuild = 'hg19', tsbCol = 'Tumor_Sample_Barcode', table = 'ensGene', header = TRUE)

Subsetting results in gene names as Tumor_Sample_Barcode; gives error if mafObj=TRUE:

subsetMaf(maf = var.annovar.maf, query = "Variant_Classification == 'Missense_Mutation'")
#the above works, but gene names appear in Tumor_Sample_Barcode column

subsetMaf(maf = var.annovar.maf, query = "Variant_Classification == 'Missense_Mutation'", mafObj=TRUE)

Creating oncomatrix (this might take a while)..
Error in createOncoMatrix(maf.dat) : object 'gene' not found

When maf object is generated from a maf file, everything seems fine:

laml.input <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read.maf(maf = laml.input, useAll = FALSE)

Subsetting works in this case:

subsetMaf(maf = laml, query = "Variant_Classification == 'Missense_Mutation'")
subsetMaf(maf = laml, query = "Variant_Classification == 'Missense_Mutation'", mafObj=TRUE)

Further Info:

mafTools was installed from git.

sessionInfo()
R Under development (unstable) (2016-04-26 r70550)                                                                                            [32/1996]
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] RColorBrewer_1.1-2   maftools_0.99.45     devtools_1.12.0     
 [4] BiocInstaller_1.23.9 NMF_0.20.6           cluster_2.0.4       
 [7] rngtools_1.2.4       pkgmaker_0.22        registry_0.3        
[10] Biobase_2.33.3       BiocGenerics_0.19.2 

loaded via a namespace (and not attached):
 [1] nlme_3.1-128                bitops_1.0-6               
 [3] httr_1.2.1                  doParallel_1.0.10          
 [5] prabclus_2.2-6              GenomeInfoDb_1.9.8         
 [7] tools_3.4.0                 R6_2.1.3                   
 [9] KernSmooth_2.23-15          DBI_0.5                    
[11] colorspace_1.2-6            trimcluster_0.1-2          
[13] nnet_7.3-12                 GetoptLong_0.1.4           
[15] withr_1.0.2                 curl_1.2                   
[17] git2r_0.15.0                chron_2.3-47
[19] rtracklayer_1.33.12         labeling_0.3               
[21] slam_0.1-38                 diptest_0.75-7             
[23] caTools_1.17.1              scales_0.4.0               
[25] DEoptimR_1.0-6              mvtnorm_1.0-5              
[27] robustbase_0.92-6           stringr_1.1.0              
[29] digest_0.6.10               Rsamtools_1.25.1           
[31] cometExactTest_0.1.3        XVector_0.13.7             
[33] changepoint_2.2.1           BSgenome_1.41.2            
[35] GlobalOptions_0.0.10        RSQLite_1.0.0              
[37] shape_1.4.2                 zoo_1.7-13                 
[39] mclust_5.2                  BiocParallel_1.7.8         
[41] DPpackage_1.1-6             gtools_3.5.0               
[43] dendextend_1.3.0            dplyr_0.5.0                
[45] VariantAnnotation_1.19.10   RCurl_1.95-4.8             
[47] magrittr_1.5                modeltools_0.2-21          
[49] wordcloud_2.5               Matrix_1.2-7.1             
[51] Rcpp_0.12.7                 munsell_0.4.3              
[53] S4Vectors_0.11.13           stringi_1.1.1              
[55] whisker_0.3-2               MASS_7.3-45                
[57] SummarizedExperiment_1.3.82 zlibbioc_1.19.0            
[59] flexmix_2.3-13              gplots_3.0.1               
[61] plyr_1.8.4                  grid_3.4.0                 
[63] gdata_2.17.0                ggrepel_0.5                
[65] lattice_0.20-33             Biostrings_2.41.4          
[67] cowplot_0.6.2               splines_3.4.0              
[69] GenomicFeatures_1.25.16     circlize_0.3.8             
[71] ComplexHeatmap_1.11.6       GenomicRanges_1.25.93      
[73] rjson_0.2.15                fpc_2.1-10                 
[75] reshape2_1.4.1              codetools_0.2-14           
[77] biomaRt_2.29.2              stats4_3.4.0               
[79] XML_3.98-1.4                data.table_1.9.6           
[81] foreach_1.4.3               gtable_0.2.0               
[83] kernlab_0.9-24              assertthat_0.1             
[85] ggplot2_2.1.0               gridBase_0.4-7             
[87] xtable_1.8-2                class_7.3-14               
[89] survival_2.39-5             tibble_1.2                 
[91] iterators_1.0.8             memoise_1.0.0              
[93] GenomicAlignments_1.9.6     AnnotationDbi_1.35.4       
[95] IRanges_2.7.15

Thanks for looking into it!

PoisonAlien commented 7 years ago

Hi,

Thanks for reporting this ! This was a mistake from my side, should have been more careful. You were right about adding hugo_symbol to barcodes. I have fixed it. Thanks again.

P.S: that after subsetting in the above example you may not be be able to draw heat maps or rainfall plots since you're left with only one sample.

sribi commented 7 years ago

Hi, Thanks a lot for fixing this so promptly! subsetMaf() and oncoplot() now work as expected. However, I am still having issues with 'rainfallPlot()':

#import
> var.annovar.maf <- annovarToMaf(annovar = var.annovar, Center = 'CSI-NUS', refBuild = 'hg19', tsbCol = 'Tumor_Sample_Barcode', table = 'ensGene', header = TRUE, MAFobj=TRUE)

#plot
> rainfallPlot(maf=var.annovar.maf, detectChangePoints = TRUE, fontSize = 12, pointSize = 0.6)
Error in seg.spl[[1]] : subscript out of bounds
In addition: Warning message:
In is.factor(x) : NAs introduced by coercion

#check if more than one sample 
> getSampleSummary(var.annovar.maf)
   Tumor_Sample_Barcode Frame_Shift_Del Frame_Shift_Ins Missense_Mutation total
1:               fake_4               1               0                 1     2
2:               fake_5               0               1                 1     2
3:               fake_7               0               1                 1     2
4:               fake_1               0               0                 1     1
5:               fake_2               0               0                 1     1
6:               fake_3               0               0                 1     1
7:               fake_6               0               0                 1     1

I was assuming the issue might be due to the low number of mutations in the test dataset, but I'm running into the same issue with my own, larger dataset with plenty of mutations...

Thanks again, sribi

PoisonAlien commented 7 years ago

Hi, This was an issue with empty chromosomes. i.e, chromosomes with no variants resulted in error. I have fixed it. I would really appreciate if you could try installing it again and let me know if it works.

Target level branch, since I haven't merged it with master yet.

devtools::install_github(repo = "PoisonAlien/maftools", ref = 'devel')

Thanks again. This was helpful.

sribi commented 7 years ago

Hi,

The bug in rainfallPlot() , as described above, seems to be fixed in your latest push to devel!

Thanks for fixing it so quickly.