Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
26 stars 20 forks source link

readVcfAsVRanges (Number = A) #4

Closed sigven closed 6 years ago

sigven commented 6 years ago

Hi,

I have a VCF file with some INFO fields encoded with Number=A. VariantAnnotation::readVcf works fine, but VariantAnnotation::readVcfAsVRanges does not work (_Error: is(values, "vector_ORfactor") is not TRUE).

When I edit the VCF, setting the Number fields of the INFO tags to Number=1, readVcfAsVRanges works again. Is there a way to make readVcfAsVRanges work on VCF files with INFO tags encoded with such allele-specific numbering (Number=A)?

sessionInfo: ` R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale: [1] C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] VariantAnnotation_1.24.2 Rsamtools_1.30.0
[3] SummarizedExperiment_1.8.1 DelayedArray_0.4.1
[5] matrixStats_0.52.2 Biobase_2.38.0
[7] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.46.0
[9] rtracklayer_1.38.2 Biostrings_2.46.0
[11] XVector_0.18.0 GenomicRanges_1.30.1
[13] GenomeInfoDb_1.14.0 IRanges_2.12.0
[15] S4Vectors_0.16.0 BiocGenerics_0.24.0

loaded via a namespace (and not attached): [1] Rcpp_0.12.14 compiler_3.4.3 prettyunits_1.0.2
[4] progress_1.1.2 GenomicFeatures_1.30.0 bitops_1.0-6
[7] tools_3.4.3 zlibbioc_1.24.0 biomaRt_2.34.1
[10] digest_0.6.13 bit_1.1-12 RSQLite_2.0
[13] memoise_1.1.0 tibble_1.3.4 lattice_0.20-35
[16] rlang_0.1.6 Matrix_1.2-11 DBI_0.7
[19] GenomeInfoDbData_1.0.0 httr_1.3.1 stringr_1.2.0
[22] bit64_0.9-7 grid_3.4.3 R6_2.2.2
[25] AnnotationDbi_1.40.0 XML_3.98-1.9 RMySQL_0.10.13
[28] BiocParallel_1.12.0 magrittr_1.5 blob_1.1.0
[31] GenomicAlignments_1.14.1 assertthat_0.2.0 stringi_1.1.6
[34] RCurl_1.95-4.8
`

lawremi commented 6 years ago

Would you please provide an example input that reproduces the issue?

sigven commented 6 years ago

Content of VCF file (example_fail.vcf):

fileformat=VCFv4.2

FILTER=

INFO=

INFO=

INFO=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO

1 1051653 . G A 0 PASS AF=0.237288;DP=59;NORMAL_AF=0;NORMAL_DP=54

> test <- VariantAnnotation::readVcfAsVRanges('example_fail.vcf', genome='hg19') Error: is(values, "vector_OR_factor") is not TRUE

Content of VCF file (example_work.vcf):

fileformat=VCFv4.2

FILTER=

INFO=

INFO=

INFO=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO

1 1051653 . G A 0 PASS AF=0.237288;DP=59;NORMAL_AF=0;NORMAL_DP=54

> test <- VariantAnnotation::readVcfAsVRanges('example_work.vcf', genome='hg19') >

DarioS commented 6 years ago

Section 1.2.2 of the VCF specification defines A, R, G and . as special values of the Number field. It appears that readVcfAsVRanges is incorrectly forcing the values to be numeric.

lawremi commented 6 years ago

When I get a chance I'll see what it actually is.

sigven commented 6 years ago

Actually, it seems that the VCF example I was providing is not strictly adhering to the VCF format. It seems that the 'DP' tag is a reserved element, and that it is required that this has Number=1.

I received this message when I ran the example VCF with vcf_validator: Line 5: INFO DP metadata Number is not 1

When i renamed the DP tag to 'DP2', it can have Number=A, Number=. etc. So in summary, it seems readVcfAsVRanges works as it should, and that is was only my input VCF that was not properly encoded. You can safely remove the issue.

lawremi commented 6 years ago

Thanks, I just came here to say that. Btw, if you had made a new reply instead of editing the previous one, I would have been notified by email.