UMCUGenetics / MutationalPatterns

R package for extracting and visualizing mutational patterns in base substitution catalogues
MIT License
104 stars 45 forks source link

mut_mat error #44

Closed twoshoes closed 5 years ago

twoshoes commented 5 years ago

Hi,

I'm trying to run mut_mat and am getting the same error as reported here.

Error in if (type < 4) context = which(C_TRIPLETS == type_context[[2]][i]) else context = which(T_TRIPLETS == : argument is of length zero

I'm running this on a GrangesList of ~350 GRanges objects, created from a MAF file, which closely match the format of the GRanges objects in the list in the example. I've attached an example of the structure of one of these below, and my R session info is at the bottom. The structure I have can be replicated by reading this in as tmp then:

tmp=GRangesList(GRanges(seqnames=tmp$seqnames, ranges=IRanges(start=tmp$start, end=tmp$end), REF=DNAStringSet(tmp$REF), ALT=DNAStringSet(tmp$ALT), Tumor_Sample_Barcode=tmp$Tumor_Sample_Barcode))

mut_matrix(vcf_list = maf.gr, ref_genome=hg38)

I'd really appreciate any help you can give me with this.

John

maf_test.txt

R version 3.6.1 (2019-07-05) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.3

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_3.1.1 maftools_2.0.16 TCGAbiolinks_2.12.6
[4] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome.Hsapiens.UCSC.hg38_1.4.1 BSgenome_1.52.0
[7] rtracklayer_1.44.0 Biostrings_2.52.0 XVector_0.24.0
[10] SomaticCancerAlterations_1.20.0 MutationalPatterns_1.10.0 NMF_0.21.0
[13] Biobase_2.44.0 cluster_2.1.0 rngtools_1.4
[16] pkgmaker_0.27 registry_0.5-1 GenomicRanges_1.36.0
[19] GenomeInfoDb_1.20.0 IRanges_2.18.0 S4Vectors_0.22.0
[22] BiocGenerics_0.30.0

loaded via a namespace (and not attached): [1] backports_1.1.4 circlize_0.4.8 aroma.light_3.14.0 plyr_1.8.4
[5] selectr_0.4-1 ConsensusClusterPlus_1.48.0 lazyeval_0.2.2 splines_3.6.1
[9] BiocParallel_1.18.0 gridBase_0.4-7 sva_3.32.1 digest_0.6.20
[13] foreach_1.4.7 fansi_0.4.0 magrittr_1.5 memoise_1.1.0
[17] doParallel_1.0.15 limma_3.40.6 ComplexHeatmap_2.0.0 readr_1.3.1
[21] annotate_1.62.0 wordcloud_2.6 matrixStats_0.54.0 R.utils_2.9.0
[25] prettyunits_1.0.2 colorspace_1.4-1 blob_1.1.1 rvest_0.3.4
[29] ggrepel_0.7.1 xfun_0.7 dplyr_0.8.3 crayon_1.3.4
[33] RCurl_1.95-4.12 jsonlite_1.6 genefilter_1.66.0 zeallot_0.1.0
[37] zoo_1.8-6 survival_2.44-1.1 VariantAnnotation_1.30.1 iterators_1.0.12
[41] glue_1.3.1 survminer_0.4.6 gtable_0.3.0 zlibbioc_1.30.0
[45] GetoptLong_0.1.7 DelayedArray_0.10.0 shape_1.4.4 scales_1.0.0
[49] DESeq_1.36.0 DBI_1.0.0 edgeR_3.26.5 bibtex_0.4.2
[53] ggthemes_4.2.0 Rcpp_1.0.2 xtable_1.8-4 progress_1.2.2
[57] clue_0.3-57 bit_1.1-14 matlab_1.0.2 km.ci_0.5-2
[61] httr_1.4.1 RColorBrewer_1.1-2 pkgconfig_2.0.2 XML_3.98-1.19
[65] R.methodsS3_1.7.1 utf8_1.1.4 locfit_1.5-9.1 labeling_0.3
[69] tidyselect_0.2.5 rlang_0.4.0 reshape2_1.4.3 AnnotationDbi_1.46.0
[73] munsell_0.5.0 tools_3.6.1 cli_1.1.0 downloader_0.4
[77] generics_0.0.2 RSQLite_2.1.1 broom_0.5.2 stringr_1.4.0
[81] ggdendro_0.1-20 knitr_1.23 bit64_0.9-7 survMisc_0.5.5
[85] purrr_0.3.2 EDASeq_2.18.0 nlme_3.1-140 R.oo_1.22.0
[89] pracma_2.2.5 xml2_1.2.2 biomaRt_2.40.0 compiler_3.6.1
[93] rstudioapi_0.10 curl_4.0 png_0.1-7 ggsignif_0.5.0
[97] tibble_2.1.3 geneplotter_1.62.0 stringi_1.4.3 GenomicFeatures_1.36.1
[101] exomeCopy_1.30.0 lattice_0.20-38 Matrix_1.2-17 vctrs_0.2.0
[105] KMsurv_0.1-5 pillar_1.4.2 BiocManager_1.30.4 GlobalOptions_0.1.0
[109] data.table_1.12.2 cowplot_0.9.4 bitops_1.0-6 R6_2.4.0
[113] latticeExtra_0.6-28 hwriter_1.3.2 ShortRead_1.42.0 gridExtra_2.3
[117] codetools_0.2-16 MASS_7.3-51.4 assertthat_0.2.1 SummarizedExperiment_1.14.0 [121] rjson_0.2.20 withr_2.1.2 GenomicAlignments_1.20.0 Rsamtools_2.0.0
[125] GenomeInfoDbData_1.2.1 mgcv_1.8-28 hms_0.4.2 grid_3.6.1
[129] tidyr_0.8.3 ggpubr_0.2.2

roelj commented 5 years ago

Hi @twoshoes

Thanks for reporting this problem. I think the problem is that MutationalPatterns does not expect an asterisk in the REF column. Should this be treated as an N (no known nucleotide), or should this be treated as "there are multiple options, because there are multiple reference alleles"?

I suspect the asterisk indicates multiple reference alleles, and I don't know what a proper way of dealing with that is. What would you expect your signature to look like? Should it count each reference allele as a separate X>Y mutation? Only the first reference allele?

twoshoes commented 5 years ago

Thanks for getting back to me so quickly. I don't think there are asterisks in the REF column, those are in the strand column. The alignment of the column names is off when you open the .txt file, but the 6th column is the REF column, corresponding to the 6th column name.

FrancisBlokzijl commented 5 years ago

Hi!

I think the problem is the "-" in the ALT column. First filter these out (as they do not represent SNVs), and try again.

Let me know if it works! Francis

twoshoes commented 5 years ago

Yes! Thank you! Sorry to bother you with something so trivial

FrancisBlokzijl commented 5 years ago

No problem, happy plotting!