Closed sigven closed 6 years ago
Would you please provide an example input that reproduces the issue?
Content of VCF file (example_fail.vcf):
1 1051653 . G A 0 PASS AF=0.237288;DP=59;NORMAL_AF=0;NORMAL_DP=54
> test <- VariantAnnotation::readVcfAsVRanges('example_fail.vcf', genome='hg19')
Error: is(values, "vector_OR_factor") is not TRUE
Content of VCF file (example_work.vcf):
1 1051653 . G A 0 PASS AF=0.237288;DP=59;NORMAL_AF=0;NORMAL_DP=54
> test <- VariantAnnotation::readVcfAsVRanges('example_work.vcf', genome='hg19')
>
Section 1.2.2 of the VCF specification defines A, R, G and . as special values of the Number
field. It appears that readVcfAsVRanges
is incorrectly forcing the values to be numeric.
When I get a chance I'll see what it actually is.
Actually, it seems that the VCF example I was providing is not strictly adhering to the VCF format. It seems that the 'DP' tag is a reserved element, and that it is required that this has Number=1.
I received this message when I ran the example VCF with vcf_validator:
Line 5: INFO DP metadata Number is not 1
When i renamed the DP tag to 'DP2', it can have Number=A, Number=. etc. So in summary, it seems readVcfAsVRanges
works as it should, and that is was only my input VCF that was not properly encoded. You can safely remove the issue.
Thanks, I just came here to say that. Btw, if you had made a new reply instead of editing the previous one, I would have been notified by email.
Hi,
I have a VCF file with some INFO fields encoded with
Number=A.
VariantAnnotation::readVcf
works fine, butVariantAnnotation::readVcfAsVRanges
does not work (_Error: is(values, "vector_ORfactor") is not TRUE).When I edit the VCF, setting the Number fields of the INFO tags to Number=1,
readVcfAsVRanges
works again. Is there a way to makereadVcfAsVRanges
work on VCF files with INFO tags encoded with such allele-specific numbering (Number=A
)?sessionInfo: ` R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale: [1] C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base
other attached packages: [1] VariantAnnotation_1.24.2 Rsamtools_1.30.0
[3] SummarizedExperiment_1.8.1 DelayedArray_0.4.1
[5] matrixStats_0.52.2 Biobase_2.38.0
[7] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.46.0
[9] rtracklayer_1.38.2 Biostrings_2.46.0
[11] XVector_0.18.0 GenomicRanges_1.30.1
[13] GenomeInfoDb_1.14.0 IRanges_2.12.0
[15] S4Vectors_0.16.0 BiocGenerics_0.24.0
loaded via a namespace (and not attached): [1] Rcpp_0.12.14 compiler_3.4.3 prettyunits_1.0.2
[4] progress_1.1.2 GenomicFeatures_1.30.0 bitops_1.0-6
[7] tools_3.4.3 zlibbioc_1.24.0 biomaRt_2.34.1
[10] digest_0.6.13 bit_1.1-12 RSQLite_2.0
[13] memoise_1.1.0 tibble_1.3.4 lattice_0.20-35
[16] rlang_0.1.6 Matrix_1.2-11 DBI_0.7
[19] GenomeInfoDbData_1.0.0 httr_1.3.1 stringr_1.2.0
[22] bit64_0.9-7 grid_3.4.3 R6_2.2.2
[25] AnnotationDbi_1.40.0 XML_3.98-1.9 RMySQL_0.10.13
[28] BiocParallel_1.12.0 magrittr_1.5 blob_1.1.0
[31] GenomicAlignments_1.14.1 assertthat_0.2.0 stringi_1.1.6
[34] RCurl_1.95-4.8
`