PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
448 stars 222 forks source link

annovarToMaf detecting only 1 gene? #683

Closed ellchua closed 3 years ago

ellchua commented 3 years ago

I'm trying to create a .maf object by using annovarToMaf. This is the code I'm using:

files = list.files("results/ANNOVAR", full.names = TRUE)

annovar_maf = annovarToMaf(annovar = files, Center = NULL, refBuild = "hg38", tsbCol = NULL, table = "ensGene", ens2hugo = TRUE, MAFobj = TRUE, basename = "annovar")

This is the output for files:

> files
 [1] "results/ANNOVAR/pass_0802213A.indels.hg38_multianno.txt"   "results/ANNOVAR/pass_0802213A.snvs.hg38_multianno.txt"    
 [3] "results/ANNOVAR/pass_CX_1200134.indels.hg38_multianno.txt" "results/ANNOVAR/pass_CX_1200134.snvs.hg38_multianno.txt"  
 [5] "results/ANNOVAR/pass_FR_12_20.indels.hg38_multianno.txt"   "results/ANNOVAR/pass_FR_12_20.snvs.hg38_multianno.txt"    
 [7] "results/ANNOVAR/pass_FR_13_002.indels.hg38_multianno.txt"  "results/ANNOVAR/pass_FR_13_002.snvs.hg38_multianno.txt"   
 [9] "results/ANNOVAR/pass_FR_13_030.indels.hg38_multianno.txt"  "results/ANNOVAR/pass_FR_13_030.snvs.hg38_multianno.txt"   
[11] "results/ANNOVAR/pass_FR_13_070.indels.hg38_multianno.txt"  "results/ANNOVAR/pass_FR_13_070.snvs.hg38_multianno.txt"   
[13] "results/ANNOVAR/pass_FR_14_010.indels.hg38_multianno.txt"  "results/ANNOVAR/pass_FR_14_010.snvs.hg38_multianno.txt"   
[15] "results/ANNOVAR/pass_FR_14_028.indels.hg38_multianno.txt"  "results/ANNOVAR/pass_FR_14_028.snvs.hg38_multianno.txt"   
[17] "results/ANNOVAR/pass_FR_14_035.indels.hg38_multianno.txt"  "results/ANNOVAR/pass_FR_14_035.snvs.hg38_multianno.txt"

Everything seems to run fine, but when I look at annovar_maf, it seems that there is only 1 gene?

> annovar_maf
An object of class  MAF 
                        ID summary   Mean Median
 1:             NCBI_Build      NA     NA     NA
 2:                 Center      NA     NA     NA
 3:                Samples       9     NA     NA
 4:                 nGenes       1     NA     NA
 5:        Frame_Shift_Ins       1  0.111      0
 6:      Missense_Mutation     258 28.667     25
 7:      Nonsense_Mutation      14  1.556      1
 8:            Splice_Site       5  0.556      0
 9: Translation_Start_Site       1  0.111      0
10:                  total     279 31.000     29

Running r oncoplot(maf = annovar_maf, top = 10) then gives me the error:

Error in oncoplot(maf = annovar_maf, top = 10) : Oncoplot requires at-least two genes for plottng.

Here's what head(annovar_maf@data) looks like:

> head(annovar_maf@data)
   Hugo_Symbol Tumor_Sample_Barcode Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2 Variant_Classification                tx  exon
1: UnknownGene       pass_FR_14_035       chrX       75071658     75071658                G                 T      Missense_Mutation ENST00000339447.8 exon8
2: UnknownGene       pass_FR_14_028      chr13       76955496     76955496                G                 A      Missense_Mutation ENST00000449753.2 exon3
3: UnknownGene       pass_FR_13_070      chr14       73575044     73575044                A                 G      Missense_Mutation ENST00000238651.9 exon3
4: UnknownGene        pass_0802213A       chr4        7872078      7872078                T                 C Translation_Start_Site ENST00000360265.8 exon1
5: UnknownGene       pass_FR_13_002       chr1       26345696     26345696                G                 A      Missense_Mutation ENST00000308182.9 exon2
6: UnknownGene        pass_FR_12_20      chr10        5213101      5213101                G                 A      Missense_Mutation ENST00000380448.5 exon9
   txChange aaChange Variant_Type Func.refGene Gene.refGene GeneDetail.refGene ExonicFunc.refGene
1:  c.C938A  p.A313E          SNP       exonic        ABCB7               <NA>  nonsynonymous SNV
2:  c.G442A  p.A148T          SNP       exonic        ACOD1               <NA>  nonsynonymous SNV
3:  c.A983G  p.H328R          SNP       exonic        ACOT2               <NA>  nonsynonymous SNV
4:    c.A1G    p.M1?          SNP       exonic        AFAP1               <NA>          startloss
5:  c.C962T  p.P321L          SNP       exonic        AIM1L               <NA>  nonsynonymous SNV
6:  c.G788A  p.R263H          SNP       exonic       AKR1C4               <NA>  nonsynonymous SNV
                                                                                                                                                          AAChange.refGene
1:                                                                            ABCB7:ENST00000339447.8:exon8:c.C938A:p.A313E,ABCB7:ENST00000373394.7:exon9:c.C1058A:p.A353E
2:                                                                             ACOD1:ENST00000449753.2:exon3:c.G442A:p.A148T,ACOD1:ENST00000377462.5:exon4:c.G442A:p.A148T
3:                               ACOT2:ENST00000238651.9:exon3:c.A983G:p.H328R,ACOT2:ENST00000613168.1:exon3:c.A923G:p.H308R,ACOT2:ENST00000622407.4:exon4:c.A797G:p.H266R
4: AFAP1:ENST00000360265.8:exon1:c.A1G:p.M1?,AFAP1:ENST00000382543.3:exon1:c.A1G:p.M1?,AFAP1:ENST00000358461.6:exon2:c.A1G:p.M1?,AFAP1:ENST00000420658.5:exon2:c.A1G:p.M1?
5:                                                                                                                           AIM1L:ENST00000308182.9:exon2:c.C962T:p.P321L
6:                                                                                                                          AKR1C4:ENST00000380448.5:exon9:c.G788A:p.R263H
   hgnc_symbol Entrez ens_id Entrez_Gene_Id
1:        <NA>     NA  ABCB7             NA
2:        <NA>     NA  ACOD1             NA
3:        <NA>     NA  ACOT2             NA
4:        <NA>     NA  AFAP1             NA
5:        <NA>     NA  AIM1L             NA
6:        <NA>     NA AKR1C4             NA

Is it something wrong with the format of my ANNOVAR .txt files?

Session info

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] R.utils_2.10.1              R.oo_1.24.0                 R.methodsS3_1.8.1           maftools_2.7.10             dplyr_1.0.3                
 [6] rsconnect_0.8.16            kableExtra_1.3.1            org.Dm.eg.db_3.11.4         stringr_1.4.0               enrichplot_1.8.1           
[11] ReactomePA_1.32.0           clusterProfiler_3.16.1      pheatmap_1.0.12             limma_3.44.3                geneplotter_1.66.0         
[16] annotate_1.66.0             XML_3.99-0.5                reshape_0.8.8               lattice_0.20-41             gridExtra_2.3              
[21] gplots_3.0.4                RColorBrewer_1.1-2          Biostrings_2.56.0           XVector_0.28.0              scales_1.1.1               
[26] reshape2_1.4.4              knitr_1.29                  biomaRt_2.44.1              GenomicFeatures_1.40.1      AnnotationDbi_1.50.3       
[31] genefilter_1.70.0           ggplot2_3.3.2               DESeq2_1.28.1               SummarizedExperiment_1.18.2 DelayedArray_0.14.1        
[36] matrixStats_0.56.0          Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2         IRanges_2.22.2             
[41] S4Vectors_0.26.1            BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
  [1] backports_1.1.10         fastmatch_1.1-0          BiocFileCache_1.12.1     plyr_1.8.6               igraph_1.2.6             splines_4.0.2           
  [7] BiocParallel_1.22.0      inline_0.3.15            urltools_1.7.3           digest_0.6.25            htmltools_0.5.0          GOSemSim_2.14.2         
 [13] viridis_0.5.1            GO.db_3.11.4             gdata_2.18.0             fansi_0.4.1              checkmate_2.0.0          magrittr_2.0.1          
 [19] memoise_1.1.0            remotes_2.2.0            graphlayouts_0.7.1       RcppParallel_5.0.2       askpass_1.1              prettyunits_1.1.1       
 [25] colorspace_1.4-1         rvest_0.3.6              blob_1.2.1               rappdirs_0.3.1           ggrepel_0.8.2            xfun_0.16               
 [31] callr_3.4.4              crayon_1.3.4             RCurl_1.98-1.2           jsonlite_1.7.0           graph_1.66.0             scatterpie_0.1.5        
 [37] survival_3.1-12          glue_1.4.2               polyclip_1.10-0          gtable_0.3.0             zlibbioc_1.34.0          webshot_0.5.2           
 [43] V8_3.2.0                 graphite_1.34.0          pkgbuild_1.1.0           rstan_2.21.2             DOSE_3.14.0              DBI_1.1.0               
 [49] Rcpp_1.0.5               viridisLite_0.3.0        xtable_1.8-4             progress_1.2.2           gridGraphics_0.5-0       reactome.db_1.70.0      
 [55] bit_4.0.4                europepmc_0.4            StanHeaders_2.21.0-6     httr_1.4.2               fgsea_1.14.0             ellipsis_0.3.1          
 [61] pkgconfig_2.0.3          loo_2.3.1                farver_2.0.3             dbplyr_1.4.4             locfit_1.5-9.4           ggplotify_0.0.5         
 [67] tidyselect_1.1.0         labeling_0.3             rlang_0.4.10             munsell_0.5.0            tools_4.0.2              downloader_0.4          
 [73] cli_2.0.2                generics_0.0.2           RSQLite_2.2.0            ggridges_0.5.2           evaluate_0.14            yaml_2.2.1              
 [79] processx_3.4.4           bit64_4.0.2              tidygraph_1.2.0          caTools_1.18.0           purrr_0.3.4              ggraph_2.0.4            
 [85] xml2_1.3.2               DO.db_2.9                compiler_4.0.2           rstudioapi_0.11          curl_4.3                 tibble_3.0.3            
 [91] tweenr_1.0.1             stringi_1.4.6            highr_0.8                ps_1.3.4                 Matrix_1.2-18            vctrs_0.3.6             
 [97] pillar_1.4.6             lifecycle_0.2.0          BiocManager_1.30.10      triebeard_0.3.0          data.table_1.13.0        cowplot_1.0.0           
[103] bitops_1.0-6             rtracklayer_1.48.0       qvalue_2.20.0            R6_2.4.1                 KernSmooth_2.23-17       codetools_0.2-16        
[109] MASS_7.3-51.6            gtools_3.8.2             assertthat_0.2.1         rprojroot_1.3-2          openssl_1.4.2            withr_2.3.0             
[115] GenomicAlignments_1.24.0 Rsamtools_2.4.0          GenomeInfoDbData_1.2.3   hms_0.5.3                tidyr_1.1.2              rvcheck_0.1.8           
[121] rmarkdown_2.3            ggforce_0.3.2    
PoisonAlien commented 3 years ago

Hi,

How does it look when you run with ens2hugo = FALSE? I see that you already have gene symbols under ens_id, so you don't have to use ens2hugo=TRUE.

ellchua commented 3 years ago

Ah yes, that seems to work. It finds about 250 genes now. Is it the case that we have to explicitly state ens2hugo = FALSE? Because I didn't include it the first time round and got the same results as ens2hugo = TRUE. Looking at the help page it doesn't say that the default is true or false.

PoisonAlien commented 3 years ago

By default, it is set to FALSE. The earlier version of annovar had ensemble gene IDs and didn't have gene symbols. So I included this option. But I guess they have changed it now.