UMCUGenetics / MutationalPatterns

R package for extracting and visualizing mutational patterns in base substitution catalogues
MIT License
104 stars 45 forks source link

read_vcfs_as_granges() does not work #67

Closed CPTPaso closed 3 years ago

CPTPaso commented 3 years ago

Hi,

I am trying to run the code from the R vignette. However

grl <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome)

throws the error "Some of your data is missing a ALT column." I run the code exactly as it is given in the vignette and also use the vcf files of the package. I also checked these files and there definitely is an ALT column.

How can I fix this?

FreekManders commented 3 years ago

Hi,

This should work. Can you share your session info and the code you have run?

CPTPaso commented 3 years ago

Sure!

SessionInfo: R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.58.0
[3] rtracklayer_1.49.5 Biostrings_2.57.2
[5] XVector_0.29.3 MutationalPatterns_3.0.1
[7] NMF_0.23.0 Biobase_2.49.1
[9] cluster_2.1.0 rngtools_1.5
[11] pkgmaker_0.32.2 registry_0.5-1
[13] GenomicRanges_1.42.0 GenomeInfoDb_1.25.11
[15] IRanges_2.23.10 S4Vectors_0.27.12
[17] BiocGenerics_0.35.4 ggplot2_3.3.2

loaded via a namespace (and not attached): [1] bitops_1.0-6 matrixStats_0.57.0 bit64_4.0.5
[4] doParallel_1.0.15 RColorBrewer_1.1-2 progress_1.2.2
[7] httr_1.4.2 tools_4.0.2 R6_2.4.1
[10] DBI_1.1.0 colorspace_1.4-1 withr_2.2.0
[13] tidyselect_1.1.0 prettyunits_1.1.1 curl_4.3
[16] bit_4.0.4 compiler_4.0.2 xml2_1.3.2
[19] DelayedArray_0.15.7 scales_1.1.1 askpass_1.1
[22] rappdirs_0.3.1 stringr_1.4.0 digest_0.6.25
[25] Rsamtools_2.5.3 rmarkdown_2.3 pkgconfig_2.0.3
[28] htmltools_0.5.0 MatrixGenerics_1.2.0 dbplyr_1.4.4
[31] rlang_0.4.7 rstudioapi_0.11 RSQLite_2.2.0
[34] generics_0.0.2 BiocParallel_1.23.2 dplyr_1.0.2
[37] VariantAnnotation_1.36.0 RCurl_1.98-1.2 magrittr_1.5
[40] GenomeInfoDbData_1.2.3 Matrix_1.2-18 Rcpp_1.0.5
[43] munsell_0.5.0 lifecycle_0.2.0 stringi_1.4.6
[46] yaml_2.2.1 ggalluvial_0.12.3 SummarizedExperiment_1.19.6 [49] zlibbioc_1.34.0 plyr_1.8.6 BiocFileCache_1.13.1
[52] grid_4.0.2 blob_1.2.1 crayon_1.3.4
[55] lattice_0.20-41 GenomicFeatures_1.40.1 hms_0.5.3
[58] knitr_1.29 pillar_1.4.6 reshape2_1.4.4
[61] codetools_0.2-16 biomaRt_2.46.0 XML_3.99-0.5
[64] glue_1.4.1 evaluate_0.14 BiocManager_1.30.10
[67] vctrs_0.3.2 foreach_1.5.0 gtable_0.3.0
[70] openssl_1.4.2 purrr_0.3.4 assertthat_0.2.1
[73] xfun_0.16 gridBase_0.4-7 xtable_1.8-4
[76] pracma_2.3.3 tibble_3.0.3 iterators_1.0.12
[79] GenomicAlignments_1.25.3 AnnotationDbi_1.51.3 memoise_1.1.0
[82] ellipsis_0.3.1

Code:

----setup, include=FALSE-----------------------------------------------------

knitr::opts_chunk$set(echo = TRUE)

---- echo=FALSE------------------------------------------------------------------------------

options(width = 96) library(ggplot2) library(BiocStyle)

----install_package, eval=FALSE--------------------------------------------------------------

if (!requireNamespace("BiocManager", quietly = TRUE))

install.packages("BiocManager")

BiocManager::install("MutationalPatterns")

----Load package, message=FALSE--------------------------------------------------------------

library(MutationalPatterns)

---- message=FALSE---------------------------------------------------------------------------

library(BSgenome) head(available.genomes())

---------------------------------------------------------------------------------------------

ref_genome <- "BSgenome.Hsapiens.UCSC.hg19" library(ref_genome, character.only = TRUE)

----locate_vcfs------------------------------------------------------------------------------

vcf_files <- list.files(system.file("extdata", package = "MutationalPatterns"), pattern = "sample.vcf", full.names = TRUE )

vcf_files <- vcf_files[2]

----set_sample_names-------------------------------------------------------------------------

sample_names <- "colon2"

----read_vcfs_as_granges, message=FALSE------------------------------------------------------

grl <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome)

sessionInfo()

FreekManders commented 3 years ago

Running the same code on my machine doesn't generate an error. What happens if you try to run GenomicRanges:granges(VariantAnnotation::readVcf(vcf_files))? Does this generate a granges object with an ALT column?

CPTPaso commented 3 years ago

When I run GenomicRanges::granges(VariantAnnotation::readVcf(vcf_files)) I still get a granges object with only seqnames, start, end, width, strand, paramRangeID and an ID like 1:105605108_T/A as rownames.

Now, I try and transform that granges object into a dataframe, dissect the rownames, extract REF and ALT, generate a new dataframe including REF and ALT and use makeGRangesFromDataFrame() to generate a new granges object. That could do the trick. But that's circumstancial.

Do you have an idea why granges does not recognize the ALT and REF col in the vcf files?

CPTPaso commented 3 years ago

The problem was on my end. I ran mutational Patterns on another PC which worked without any error. So there has to be something about my R environment. But this is something I can figure out. Thanks anyhow for your help!