Closed apastore closed 5 years ago
I have the same issue. The order of the headerlines is wrong and the ALT line doesn't contain an id. It becomes:
##ALT=Represents any possible alternative allele at this location
instead of
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
I think the problems might be with the makeVcfHeader function, which seems to contain some broken code.
@apastore @FreekManders I believe this was fixed with these two commits.
commit caaaf75eb2004e04ad92eeadfb568d8992a8ad2a
Author: vobencha <valerie.obenchain@roswellpark.org>
Date: Tue Nov 27 12:21:54 2018 -0800
Fix bug in writeVcf() when 'fileformat' is missing
commit 21c508e83c013b1655789cb832217609d2602e4b
Author: vobencha <valerie.obenchain@roswellpark.org>
Date: Mon Nov 19 12:33:03 2018 -0800
writeVcf() now writes 'fileFormat' as the first header line
Please reopen the issue if you're still having problems.
Hi @vobencha , I installed the latest version of the package from github. However, the ALT header line is still incorrect. I added my session info below.
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 VariantAnnotation_1.29.6 usethis_1.4.0 devtools_2.0.1 Rsamtools_1.34.0 Biostrings_2.50.1
[7] XVector_0.22.0 SummarizedExperiment_1.12.0 DelayedArray_0.8.0 BiocParallel_1.16.2 matrixStats_0.54.0 Biobase_2.42.0
[13] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0 forcats_0.3.0
[19] stringr_1.3.1 dplyr_0.7.8 purrr_0.2.5 readr_1.3.0 tidyr_0.8.2 tibble_1.4.2
[25] ggplot2_3.1.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] nlme_3.1-137 fs_1.2.6 bitops_1.0-6 lubridate_1.7.4 bit64_0.9-7 progress_1.2.0 httr_1.4.0
[8] rprojroot_1.3-2 tools_3.5.1 backports_1.1.3 R6_2.3.0 DBI_1.0.0 lazyeval_0.2.1 colorspace_1.3-2
[15] withr_2.1.2 tidyselect_0.2.5 prettyunits_1.0.2 processx_3.2.1 curl_3.2 bit_1.1-14 compiler_3.5.1
[22] cli_1.0.1 rvest_0.3.2 xml2_1.2.0 desc_1.2.0 rtracklayer_1.42.1 scales_1.0.0 callr_3.1.0
[29] digest_0.6.18 pkgconfig_2.0.2 sessioninfo_1.1.1 BSgenome_1.50.0 rlang_0.3.0.1 readxl_1.2.0 rstudioapi_0.8
[36] RSQLite_2.1.1 bindr_0.1.1 generics_0.0.2 jsonlite_1.6 RCurl_1.95-4.11 magrittr_1.5 GenomeInfoDbData_1.2.0
[43] Matrix_1.2-14 Rcpp_1.0.0 munsell_0.5.0 stringi_1.2.4 yaml_2.2.0 zlibbioc_1.28.0 pkgbuild_1.0.2
[50] plyr_1.8.4 grid_3.5.1 blob_1.1.1 crayon_1.3.4 lattice_0.20-35 haven_2.0.0 GenomicFeatures_1.34.1
[57] hms_0.4.2 ps_1.2.1 pillar_1.3.1 pkgload_1.0.2 biomaRt_2.38.0 XML_3.98-1.16 glue_1.3.0
[64] remotes_2.0.2 modelr_0.1.2 cellranger_1.1.0 gtable_0.2.0 assertthat_0.2.0 broom_0.5.1 GenomicAlignments_1.18.0
[71] AnnotationDbi_1.44.0 memoise_1.1.0
Hi @FreekManders
Installing packages manually from github is generally not the best approach. We have both an active release (R 3.5.1; Bioconductor 3.8) and devel branch (R 3.6.*; Bioconductor 3.9). If you install manually from github you must be sure you're pulling for the appropriate branch for your version of R. Additionally, the github version often lags behind the version on the canonical git server, git.bioconductor.org.
From your sessionInfo() (thanks for posting) you now have the devel version of VariantAnnotation in your release version of R.
The best way to install Bioconductor packages is to instead use the BiocManager package (replaced the BiocInstaller package). The package will detect your version of R and install the appropriate Bioconductor package versions.
BiocManager::install("VariantAnnotation")
The release version of VariantAnnotation you should be using is 1.28.3 as seen on the release build report:
https://www.bioconductor.org/checkResults/release/bioc-LATEST/
I've tested with VariantAnnotation 1.29.9 (devel) and 1.28.3 (release) using the structural.vcf from the package. In both cases I see 'fileformat' at the top of the output file and ALT is properly written with the ID field.
> fl = system.file("extdata", "structural.vcf", package="VariantAnnotation")
> vcf = readVcf(fl)
> tmp = tempfile()
> writeVcf(vcf, tmp)
> tmp
[1] "/tmp/RtmpdHJDVS/file26b16202f358"
The vcf specs do not specify order in the header fields except to say that fileformat must be first. https://samtools.github.io/hts-specs/VCFv4.2.pdf
The output from the example above does list 'fileformat' first. All major categories are grouped together but header lines such as 'assembly' or 'reference' may not be in slightly different order as the original file. To my knowledge this is not a problem.
If you still see a problem with the ALT field, please provide a small sample file so I can reproduce the issue.
Thanks.
Hi @vobencha
Thank you for your helping me. I had switched to the github version of the package, because I thought that version might be newer and fix the issue I was having. I have now removed the package and re-installed it with the BiocManager package.
However, I'm still having the same issue. I attached a modified version of the structural.vcf from the package, which contains the ALT field, as it is present in my vcfs. I also attached the file I get when reading and writing out this example vcf with VariantAnnotation. (I changed the extensions to .txt so I could attach the files) test.txt test_out.txt
As you can see, the ALT field is altered in the output. This alteration results in an error, when trying to read in the file in IGV:
Unable to parse header with error: Invalid VCFSimpleHeaderLine: key=ALT name=null
p.s. I noticed that BiocManager installed version 1.28.5 of the package, even though I'm still on R version 3.5.1. However, I don't think this is causing my issue, because I've had the same issue with other versions of the package.
@FreekManders Thanks for the reproducible example and good catch! This is now fixed in 1.28.7 and 1.29.12. These versions should be available via BiocManager::install() ~noon EST tomorrow. Valerie
If I read a valid vcf with the last 3.8 Bioconductor version 1.28.1
and then I save it with
writeVcf(Tumor, paste0("./vcf/",samples_names[1], ".vcf"), index = F)
the header is out of order ##fileformat=VCFv4.2 is not on the first line and there is a Strange SAMPLE field
Thanks for you support!