I have been annotating VCF files with VEP.

utils::download.file("", "example_no_anno.vcf.gz")
utils::download.file("", "example_vep.vcf.gz")

VEP command on the command line

vep -i example_no_anno.vcf.gz --vcf TRUE --output_file example_vep.vcf.gz --compress_output bgzip --minimal TRUE --allele_number TRUE --everything TRUE --assembly GRCh37 --db_version 94 --merged TRUE --user anonymous --port 3337 --host --cache TRUE --dir dir_cache/ensembl-vep/94/cachedir --sift s --polyphen s --total_length TRUE --numbers TRUE --symbol TRUE --hgvs TRUE --ccds TRUE --uniprot TRUE --xref_refseq TRUE --af TRUE --max_af TRUE --af_exac TRUE --af_gnomad TRUE --pubmed TRUE --canonical TRUE --biotype TRUE

However after reading the annotated VCF file, some lines seem to be randomly split and parsed as a new line. In a minimal example with 1 variant, I end up with 2 entries in R, where the second one has half of the info column as chromosome names. Could this be a bug?


# plain vcf file
vcf <- readVcf("example_no_anno.vcf.gz")

# annotated with VEP 
# contains very long line but no errors in the format
vcf <- readVcf("example_vep.vcf.gz")


Please let me know, if you need more input to replicate this error.

Best, Michaela Müller

vobencha commented 5 years ago

Hi Michaela (@octopusCat88 ),

I'm not able to reproduce this problem with VariantAnnotation 1.28.11 which is the most current version in release.

There was a bug in parsing long records which was fixed on Jan 18th:

commit 90b9deae85acddcf8eb8a0c0c2041b51ae7cf1f1
Author: vobencha <>
Date:   Fri Jan 18 12:56:54 2019 -0800

     Fix bug in buffer reallocation when a record fills the buffer exactly

     See for

I see you're using version 1.28.7 which is a version before the fix was applied so it's possible you're hitting this bug. Please update your version of VariantAnnotation and try again.

Thanks. Valerie

Testing with VariantAnnotation 1.28.11:

> vcf_no_anno <- readVcf("example_no_anno.vcf.gz")
> colData(vcf_no_anno)
DataFrame with 1 row and 1 column
123Sample         1
> dim(vcf_no_anno)
[1] 1 1
> str(seqlevels(vcf_no_anno))
 chr "chr1"

> vcf_vep_anno <- readVcf("example_vep_anno.vcf.gz")
> colData(vcf_vep_anno)
DataFrame with 1 row and 1 column
123Sample         1
> dim(vcf_vep_anno)
[1] 1 1
> str(seqlevels(vcf_vep_anno))
 chr "chr1"

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
other attached packages:
 [1] VariantAnnotation_1.28.11   Rsamtools_1.34.1
 [3] Biostrings_2.50.2           XVector_0.22.0
 [5] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
 [7] BiocParallel_1.16.6         matrixStats_0.54.0
 [9] Biobase_2.42.0              GenomicRanges_1.34.0
[11] GenomeInfoDb_1.18.2         IRanges_2.16.0
[13] S4Vectors_0.20.1            BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0               compiler_3.5.1           prettyunits_1.0.2
 [4] GenomicFeatures_1.34.3   bitops_1.0-6             tools_3.5.1
 [7] zlibbioc_1.28.0          progress_1.2.0           biomaRt_2.38.0
[10] digest_0.6.18            bit_1.1-14               BSgenome_1.50.0
[13] RSQLite_2.1.1            memoise_1.1.0            lattice_0.20-38
[16] pkgconfig_2.0.2          rlang_0.3.1              Matrix_1.2-14
[19] DBI_1.0.0                GenomeInfoDbData_1.2.0   rtracklayer_1.42.2
[22] httr_1.4.0               stringr_1.4.0            hms_0.4.2
[25] bit64_0.9-7              grid_3.5.1               R6_2.4.0
[28] AnnotationDbi_1.44.0     XML_3.98-1.18            magrittr_1.5
[31] blob_1.1.1               GenomicAlignments_1.18.1 assertthat_0.2.0
[34] stringi_1.3.1            RCurl_1.95-4.12          crayon_1.3.4
c-mertes commented 5 years ago

Dear @octopusCat88 and @vobencha,

thanks for checking this and pointing us to the updated version. And yes after updating the package the problem is solved. So I guess this bug is a duplicate of #19

Since this was solved by updating the package we can close this issue.

Thanks again and sorry for not checking before the new version.

Best, Christian

vobencha commented 5 years ago

No problem. I'm glad the new version worked for you. Valerie