PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
443 stars 218 forks source link

Maf file format created on vcf2maf show some problems on VAF plotting #163

Closed bioinfo-dirty-jobs closed 6 years ago

bioinfo-dirty-jobs commented 6 years ago

I have created ma file starting from mutect2 smatic vcf using vcf2maf. Seem works. I concatenate 6 maf on a single file and I import on maftools. When I try to do this plot

`plotVaf(maf = laml, vafCol = 'i_TumorVAF_WU')
Error in eval(jsub, SDenv, parent.frame()) : object 't_vaf' not found

`

I realize there is not VAF value. Is it normal? How to resolve?


reading maf..
NOTE: Removed 589 duplicated variants
NOTE: Found 1 variants with no Gene Symbols.
   Hugo_Symbol Chromosome Start_Position End_Position Reference_Allele
1:                                                                    
   Tumor_Seq_Allele2 Variant_Classification Variant_Type Tumor_Sample_Barcode
1:                                                                           
Annotating them as 'UnknownGene' for convenience
NOTE: Non MAF specific values in Variant_Classification column:
[1] ""                       "Variant_Classification"
NOTE: Non MAF specific values in Variant_Type column:
[1] ""             "Variant_Type"
silent variants: 23958
                        ID     N
 1:                Samples    12
 2:                     V1     1
 3:                3'Flank   690
 4:                  3'UTR   888
 5:                5'Flank   810
 6:                  5'UTR   490
 7:                    IGR   163
 8:                 Intron 11733
 9:                    RNA  2515
10:                 Silent  4533
11:          Splice_Region  2134
12: Variant_Classification     1
Summarizing..
                        ID summary   Mean Median
 1:             NCBI_Build  GRCh37     NA     NA
 2:                 Center       .     NA     NA
 3:                Samples      12     NA     NA
 4:                 nGenes    4286     NA     NA
 5:        Frame_Shift_Del    1010  101.0   93.0
 6:        Frame_Shift_Ins     623   62.3   56.0
 7:           In_Frame_Del    3886  388.6  337.5
 8:           In_Frame_Ins      90    9.0    8.0
 9:      Missense_Mutation    9268  926.8  888.0
10:      Nonsense_Mutation     363   36.3   34.5
11:       Nonstop_Mutation       9    0.9    0.5
12:            Splice_Site     261   26.1   25.0
13: Translation_Start_Site       8    0.8    1.0
14:                  total   15518 1551.8 1415.0
Gene Summary..
      Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins
   1:        MUC6              14              11            5            0
   2:      ZNF717               9               8            1            2
   3:       CDC27               4               6            0           13
   4:       NBPF1               0               0            1            0
   5:        MUC4               7              10           15            1
  ---                                                                      
4282:     ZSCAN5C               0               0            0            0
4283:      ZSWIM5               0               0            0            0
4284:      ZSWIM6               0               0            0            0
4285:       ZUFSP               0               0            0            0
4286:      ZYG11A               0               0            0            0
      Missense_Mutation Nonsense_Mutation Nonstop_Mutation Splice_Site
   1:               510                 2                0           0
   2:               214                11                1           0
   3:               167                15                0           1
   4:               176                 0                0          10
   5:               122                 0                0           0
  ---                                                                 
4282:                 1                 0                0           0
4283:                 1                 0                0           0
4284:                 1                 0                0           0
4285:                 1                 0                0           0
4286:                 1                 0                0           0
      Translation_Start_Site total MutatedSamples AlteredSamples
   1:                      0   542             10             10
   2:                      0   246             10             10
   3:                      0   206             10             10
   4:                      0   187             10             10
   5:                      0   155             10             10
  ---                                                           
4282:                      0     1              1              1
4283:                      0     1              1              1
4284:                      0     1              1              1
4285:                      0     1              1              1
4286:                      0     1              1              1
Checking clinical data..
NOTE: Missing clinical data! It is strongly recommended to provide clinical data associated with samples if available.
Annotation missing for below samples in MAF
[1] ""
Done !
> 
PoisonAlien commented 6 years ago

You should provide column name containing vaf values as an argument to vafCol. I think there is no column by name i_TumorVAF_WU in your merged maf file? Use getFields to see available column names in maf object.

bioinfo-dirty-jobs commented 6 years ago

@PoisonAlien thanks so mcuh for the hel Here you have the columns:


getFields(sar2)
  [1] "Hugo_Symbol"                   "Entrez_Gene_Id"               
  [3] "Center"                        "NCBI_Build"                   
  [5] "Chromosome"                    "Start_Position"               
  [7] "End_Position"                  "Strand"                       
  [9] "Variant_Classification"        "Variant_Type"                 
 [11] "Reference_Allele"              "Tumor_Seq_Allele1"            
 [13] "Tumor_Seq_Allele2"             "dbSNP_RS"                     
 [15] "dbSNP_Val_Status"              "Tumor_Sample_Barcode"         
 [17] "Matched_Norm_Sample_Barcode"   "Match_Norm_Seq_Allele1"       
 [19] "Match_Norm_Seq_Allele2"        "Tumor_Validation_Allele1"     
 [21] "Tumor_Validation_Allele2"      "Match_Norm_Validation_Allele1"
 [23] "Match_Norm_Validation_Allele2" "Verification_Status"          
 [25] "Validation_Status"             "Mutation_Status"              
 [27] "Sequencing_Phase"              "Sequence_Source"              
 [29] "Validation_Method"             "Score"                        
 [31] "BAM_File"                      "Sequencer"                    
 [33] "Tumor_Sample_UUID"             "Matched_Norm_Sample_UUID"     
 [35] "HGVSc"                         "HGVSp"                        
 [37] "HGVSp_Short"                   "Transcript_ID"                
 [39] "Exon_Number"                   "t_depth"                      
 [41] "t_ref_count"                   "t_alt_count"                  
 [43] "n_depth"                       "n_ref_count"                  
 [45] "n_alt_count"                   "all_effects"                  
 [47] "Allele"                        "Gene"                         
 [49] "Feature"                       "Feature_type"                 
 [51] "Consequence"                   "cDNA_position"                
 [53] "CDS_position"                  "Protein_position"             
 [55] "Amino_acids"                   "Codons"                       
 [57] "Existing_variation"            "ALLELE_NUM"                   
 [59] "DISTANCE"                      "STRAND_VEP"                   
 [61] "SYMBOL"                        "SYMBOL_SOURCE"                
 [63] "HGNC_ID"                       "BIOTYPE"                      
 [65] "CANONICAL"                     "CCDS"                         
 [67] "ENSP"                          "SWISSPROT"                    
 [69] "TREMBL"                        "UNIPARC"                      
 [71] "RefSeq"                        "SIFT"                         
 [73] "PolyPhen"                      "EXON"                         
 [75] "INTRON"                        "DOMAINS"                      
 [77] "GMAF"                          "AFR_MAF"                      
 [79] "AMR_MAF"                       "ASN_MAF"                      
 [81] "EAS_MAF"                       "EUR_MAF"                      
 [83] "SAS_MAF"                       "AA_MAF"                       
 [85] "EA_MAF"                        "CLIN_SIG"                     
 [87] "SOMATIC"                       "PUBMED"                       
 [89] "MOTIF_NAME"                    "MOTIF_POS"                    
 [91] "HIGH_INF_POS"                  "MOTIF_SCORE_CHANGE"           
 [93] "IMPACT"                        "PICK"                         
 [95] "VARIANT_CLASS"                 "TSL"                          
 [97] "HGVS_OFFSET"                   "PHENO"                        
 [99] "MINIMISED"                     "ExAC_AF"                      
[101] "ExAC_AF_AFR"                   "ExAC_AF_AMR"                  
[103] "ExAC_AF_EAS"                   "ExAC_AF_FIN"                  
[105] "ExAC_AF_NFE"                   "ExAC_AF_OTH"                  
[107] "ExAC_AF_SAS"                   "GENE_PHENO"                   
[109] "FILTER"                        "flanking_bps"                 
[111] "variant_id"                    "variant_qual"                 
[113] "ExAC_AF_Adj"                   "ExAC_AC_AN_Adj"               
[115] "ExAC_AC_AN"                    "ExAC_AC_AN_AFR"               
[117] "ExAC_AC_AN_AMR"                "ExAC_AC_AN_EAS"               
[119] "ExAC_AC_AN_FIN"                "ExAC_AC_AN_NFE"               
[121] "ExAC_AC_AN_OTH"                "ExAC_AC_AN_SAS"               
[123] "ExAC_FILTER"                 "MUTECT2"                         

```I try also to create t_vaf column t_alt_count'/t_ref_count' but however not work
PoisonAlien commented 6 years ago

I see that you dont have a t_vaf column, but plotVaf will crate one for you since you already have t_ref_count and t_alt_count.

Run your command again without vafCol

plotVaf(maf = laml, vafCol = NULL)
bioinfo-dirty-jobs commented 6 years ago

@PoisonAlien I have this error

Error in eval(.massagei(isub), x, parent.frame()) : 
  object 'value' not found

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] maftools_1.7.05     Biobase_2.40.0      BiocGenerics_0.26.0

loaded via a namespace (and not attached):
 [1] httr_1.3.1                  bit64_0.9-7                
 [3] splines_3.5.0               foreach_1.4.4              
 [5] assertthat_0.2.0            stats4_3.5.0               
 [7] blob_1.1.1                  BSgenome_1.48.0            
 [9] GenomeInfoDbData_1.1.0      Rsamtools_1.32.0           
[11] slam_0.1-43                 progress_1.2.0             
[13] ggrepel_0.8.0               pillar_1.2.3               
[15] RSQLite_2.1.1               lattice_0.20-35            
[17] digest_0.6.15               GenomicRanges_1.32.3       
[19] RColorBrewer_1.1-2          XVector_0.20.0             
[21] colorspace_1.3-2            cowplot_0.9.2              
[23] Matrix_1.2-14               plyr_1.8.4                 
[25] pkgconfig_2.0.1             XML_3.98-1.11              
[27] bibtex_0.4.2                GetoptLong_0.1.7           
[29] biomaRt_2.36.1              zlibbioc_1.26.0            
[31] xtable_1.8-2                scales_0.5.0               
[33] BiocParallel_1.14.1         tibble_1.4.2               
[35] pkgmaker_0.27               IRanges_2.14.10            
[37] ggplot2_2.2.1               withr_2.1.2                
[39] SummarizedExperiment_1.10.1 GenomicFeatures_1.32.0     
[41] lazyeval_0.2.1              mclust_5.4                 
[43] crayon_1.3.4                survival_2.42-3            
[45] magrittr_1.5                memoise_1.1.0              
[47] doParallel_1.0.11           changepoint_2.2.2          
[49] NMF_0.21.0                  prettyunits_1.0.2          
[51] tools_3.5.0                 registry_0.5               
[53] data.table_1.11.4           hms_0.4.2                  
[55] GlobalOptions_0.1.0         matrixStats_0.53.1         
[57] gridBase_0.4-7              ComplexHeatmap_1.18.1      
[59] stringr_1.3.1               S4Vectors_0.18.3           
[61] munsell_0.5.0               cluster_2.0.7-1            
[63] rngtools_1.3.1              DelayedArray_0.6.0         
[65] AnnotationDbi_1.42.1        Biostrings_2.48.0          
[67] compiler_3.5.0              GenomeInfoDb_1.16.0        
[69] rlang_0.2.1                 grid_3.5.0                 
[71] RCurl_1.95-4.10             iterators_1.0.9            
[73] VariantAnnotation_1.26.0    rjson_0.2.20               
[75] circlize_0.4.4              bitops_1.0-6               
[77] gtable_0.2.0                codetools_0.2-15           
[79] DBI_1.0.0                   R6_2.2.2                   
[81] reshape2_1.4.3              gridExtra_2.3              
[83] zoo_1.8-2                   GenomicAlignments_1.16.0   
[85] rtracklayer_1.40.3          bit_1.1-14                 
[87] shape_1.4.4                 stringi_1.2.3              
[89] Rcpp_0.12.17                wordcloud_2.5  
PoisonAlien commented 6 years ago

Okay, as a last try - can you read your maf file freshly again and run the above command. If you still get the error then do the following to manually add vaf column,

sar2 = data.table::fread("path/to/merged.maf")
sar2[,t_vaf := as.numeric(t_alt_count)/(sum(as.numeric(t_alt_count) + as.numeric(t_alt_count))]
sar2 = maftools::read.maf(sar2)

plotVaf(maf = sar2)
PoisonAlien commented 6 years ago

data.table is the package used by maftools for data processing. Sorry I just noticed I had made a typo in above command. I have corrected it. You can try and let me know.

bioinfo-dirty-jobs commented 6 years ago

@PoisonAlien Thanks so much!!!

 sar2<-data.table::read("IMAGE/MUTECT2.vaf.maf.tsv")
Error: 'read' is not an exported object from 'namespace:data.table'
PoisonAlien commented 6 years ago

check my comment again, I have made corrections to it. It should be fread not read

bioinfo-dirty-jobs commented 6 years ago

@PoisonAlien

> sar2<-data.table::fread("IMAGE/MUTECT2.vaf.maf.tsv")
> sar2[,t_vaf:=as.numeric(t_alt_count)/(sum(as.numeric(t_alt_count) + as.numeric(t_alt_count))]
Error: unexpected ']' in "sar2[,t_vaf:=as.numeric(t_alt_count)/(sum(as.numeric(t_alt_count) + as.numeric(t_alt_count))]"
> 
PoisonAlien commented 6 years ago

Hi Sorry for the delay. Were you able to fix it ? You can attach your maf file here - maybe top 100 lines or so, I might be able to help you.

bioinfo-dirty-jobs commented 6 years ago

@PoisonAlien Thanks s match for the help! https://www.dropbox.com/s/axd25paqejmhs7z/mitico.maf.tsv?dl=0

PoisonAlien commented 6 years ago

Hi, Please attach it as a text file. Its not possible for me to parse your text.

bioinfo-dirty-jobs commented 6 years ago

@PoisonAlien Sorry I inser a link (https://www.dropbox.com/s/axd25paqejmhs7z/mitico.maf.tsv?dl=0)

PoisonAlien commented 6 years ago

Hi, plotVaf wofks fine with you toy data.

> m = read.maf(maf = "~/Downloads/mitico.maf.tsv")
> plotVaf(maf = m)
t_vaf field is missing, but found t_ref_count & t_alt_count columns. Estimating vaf..

image

Could you try again ? Just read your maf and plot it, it should work.

bioinfo-dirty-jobs commented 6 years ago

@PoisonAlien I maybe found the problem.. What I send you it is only one art of the files. The problem are related on the varscan maf file. Inside the vaue onf t_alt_count are present % simbol. That is the problem


t_depth | t_ref_count | t_alt_count | n_depth | n_ref_count | n_alt_count
-- | -- | -- | -- | -- | --
22 | 2 | 20% | 15 | 0 | 0%
PoisonAlien commented 6 years ago

I see. Sorry, you will have to manually correct them as there is no way of guessing type of values in vaf_col :|