PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
445 stars 219 forks source link

Can't open MAF file #149

Closed kvn95ss closed 6 years ago

kvn95ss commented 6 years ago

I've converted wAnnovar output into MAF using annovarToMaf, then I then wrote it into a file (output.maf). When I tried to open the file again using read.maf(), I get the following error -

maf=read.maf("output.maf") reading maf.. NOTE: Removed 34 duplicated variants silent variants: 245 ID N 1: Samples 6 2: 3'Flank 3 3: 3'UTR 3 4: IGR 12 5: Intron 204 6: RNA 23 Summarizing.. Error in dcast.data.table(data = vc, formula = Tumor_Sample_Barcode ~ : Can not cast an empty data.table

Any idea what's going on? Why am I not able to import the file?

PoisonAlien commented 6 years ago

Hi, Can you post your sessionInfo ? How many samples do you have ? If I have to guess, error is due to lack of non synonymous mutations in your file.

kvn95ss commented 6 years ago

Thanks for the quick reply, much appreciated!

I have 6 samples in total.

Here's my sessioninfo

sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04 LTS

Matrix products: default BLAS: /home/user/anaconda3/lib/R/lib/libRblas.so LAPACK: /home/user/anaconda3/lib/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] maftools_1.4.27 Biobase_2.38.0 BiocGenerics_0.24.0

loaded via a namespace (and not attached): [1] httr_1.3.1 bit64_0.9-5
[3] splines_3.4.1 foreach_1.4.4
[5] assertthat_0.2.0 stats4_3.4.1
[7] blob_1.1.1 BSgenome_1.46.0
[9] GenomeInfoDbData_1.0.0 Rsamtools_1.30.0
[11] slam_0.1-40 progress_1.1.2
[13] ggrepel_0.8.0 pillar_1.2.2
[15] RSQLite_2.0 lattice_0.20-34
[17] digest_0.6.15 GenomicRanges_1.30.3
[19] RColorBrewer_1.1-2 XVector_0.18.0
[21] colorspace_1.3-2 cowplot_0.9.2
[23] Matrix_1.2-14 plyr_1.8.4
[25] XML_3.98-1.6 GetoptLong_0.1.6
[27] biomaRt_2.34.2 zlibbioc_1.24.0
[29] xtable_1.8-2 scales_0.5.0
[31] BiocParallel_1.12.0 tibble_1.4.2
[33] pkgmaker_0.22 IRanges_2.12.0
[35] ggplot2_2.2.1 SummarizedExperiment_1.8.0 [37] GenomicFeatures_1.28.5 lazyeval_0.2.1
[39] mclust_5.4 survival_2.40-1
[41] magrittr_1.5 memoise_1.1.0
[43] doParallel_1.0.11 changepoint_2.2.2
[45] NMF_0.21.0 tools_3.4.1
[47] registry_0.5 data.table_1.10.4
[49] prettyunits_1.0.2 GlobalOptions_0.0.12
[51] matrixStats_0.53.1 gridBase_0.4-7
[53] ComplexHeatmap_1.17.1 stringr_1.3.0
[55] S4Vectors_0.16.0 munsell_0.4.3
[57] cluster_2.0.6 rngtools_1.2.4
[59] DelayedArray_0.4.1 AnnotationDbi_1.40.0
[61] Biostrings_2.46.0 compiler_3.4.1
[63] GenomeInfoDb_1.14.0 rlang_0.2.0
[65] grid_3.4.1 RCurl_1.95-4.8
[67] iterators_1.0.9 VariantAnnotation_1.24.1
[69] rjson_0.2.15 circlize_0.4.3
[71] bitops_1.0-6 gtable_0.2.0
[73] codetools_0.2-15 DBI_1.0.0
[75] reshape2_1.4.3 R6_2.2.2
[77] gridExtra_2.3 zoo_1.8-1
[79] GenomicAlignments_1.14.1 rtracklayer_1.38.3
[81] bit_1.1-12 shape_1.4.3
[83] stringi_1.1.7 Rcpp_0.12.15
[85] wordcloud_2.5

ShixiangWang commented 6 years ago

@kvn95ss you can use fread function of data.table package to read the maf file and check your data.

maf <- data.table::fread("output.maf")

then check

table(maf$Variant_Type)

or something else as @PoisonAlien mentioned.

R version 3.4.1 (2017-06-30)

the R version seems a little old, update to 3.5.0 is a good option.

kvn95ss commented 6 years ago

@ShixiangWang

Your code does import the file into R, and using the second command gives this result

table(maf$Variant_Type) DEL INS SNP 38 24 217

However, when I try to get a summary of it, I get this error

plotmafSummary(maf, rmOutlier=TRUE,addStat='median',dashboard=TRUE,titvRaw=FALSE) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘getSampleSummary’ for signature ‘"data.table"’

PoisonAlien commented 6 years ago

Hi, Little modification for @ShixiangWang suggestion, check

table(maf$Variant_Classification)

Also I see that you're using older version of maftools. Current version is 1.6.07 which requires R 3.5 and unfortunately 3.5 is not available for Ubuntu yet.

You will have to install maftools from GitHub:

library("devtools")
install_github(repo = "PoisonAlien/maftools")

Let me know..

kvn95ss commented 6 years ago

@PoisonAlien

Your modified command yeilds -

table(maf$Variant_Classification) 3'Flank 3'UTR IGR Intron RNA 3 3 12 236 25

I've installed maftools via Conda. Are you suggesting that I manually install R 3.5, then install maftools from GitHub?

ShixiangWang commented 6 years ago

@kvn95ss you cannot use functions from maftools on maf file which you load via fread function. Funtions from maftools can only put on MAF object or else built in this package.

the maf now is a data.table or data.frame, you can manipulate it by many basic functions of R.

You can not install R3.5.0 by conda (I tried, I think the newest version is 3.4.3 on R channel).

Just install newest version of maftools in your R console

#Install maftools from github repository.
library("devtools")
install_github(repo = "PoisonAlien/maftools")
PoisonAlien commented 6 years ago

Yes, no need to install 3.5, its painful to install from source. Just run the above command from your current R console, and it will install newer version. Plus, your variant classification shows there are no non-synonymous mutatiions, such as missense, nonsense, or fraemshif INDELS. This will prompt maftools to complain. Is this data from WGS or WXS ?

kvn95ss commented 6 years ago

@PoisonAlien

The data is from Targeted amplicon sequencing.

I'm going to upgrade maftools and see how it goes.

kvn95ss commented 6 years ago

I updated maftools. The last two lines of my error has been replaced by

Error in read.maf("SNP_in_cn.maf") : No non-synonymous mutations found Check vc_nonSyn`` argumet inread.maf` for details

Any way I can still process the file? Do I have to change the parameters of vc_nonSyn?

Any help appreciated!

ShixiangWang commented 6 years ago

@kvn95ss You can run

?read.maf

to see the vc_nonSyn argument.

It says

NULL. Provide manual list of variant classifications to be considered as non-synonymous. Rest will be considered as silent variants. Default uses Variant Classifications with High/Moderate variant consequences. http://asia.ensembl.org/Help/Glossary?id=535: "Frame_Shift_Del", "Frame_Shift_Ins", "Splice_Site", "Translation_Start_Site","Nonsense_Mutation", "Nonstop_Mutation", "In_Frame_Del","In_Frame_Ins", "Missense_Mutation"

Could you try

read.maf("SNP_in_cn.maf", vc_nonSyn=c("3'Flank", "3'UTR",  "IGR", " Intron",  "RNA"))
kvn95ss commented 6 years ago

@PoisonAlien

read.maf("SNP_in_cn.maf", vc_nonSyn=c("3'Flank", "3'UTR", "IGR", " Intron", "RNA"))

This worked! I can now import the file as a MAF. Functions like plotmafSummary are also working.

How can I get the information on per-sample basis?

PoisonAlien commented 6 years ago

Hi,

Please refer to vignette. You can use getSampleSummary, getGeneSummary to access information.

ShixiangWang commented 6 years ago
getSampleSummary(your_maf)

can get what you want

For more info, please run following code in you console

browseVignettes("maftools")
kvn95ss commented 6 years ago

Thanks for the help, much appreciated!

If you don't mind, I'm closing the thread!