Closed LTLA closed 2 years ago
Restating the code chunk as there is a line missing above:
fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
tmp <- tempfile()
writeVcf(vcf, filename=tmp)
lines = readLines(tmp) # missing from chunk above
lines[grepl("ALT=", lines)]
Output for ALT is clearly problematic:
## [1] "##ALT=<ID=DEL,Description=Deletion>"
## [2] "##ALT=<ID=DEL:ME:ALU,Description=Deletion of ALU element>"
...
I see how to fix the code in writeVcf
to quote the description text. But I am not sure doing this affects the output in the desired way. Can you say a little more about the defect of the readVcf result? I am seeing ALT information in the readVcf even in the bugged version. What is meant by "discards the ALT information altogether"?
When you try to call readVcf
on the incorrect headers, it will not be able to parse the string properly.
# Assuming tmp was generated as above.
fixed(header(readVcf(tmp)))$ALT
## DataFrame with 10 rows and 0 columns
In contrast:
fixed(header(vcf))$ALT
## DataFrame with 10 rows and 1 column
## Description
## <character>
## DEL Deletion
## DEL:ME:ALU Deletion of ALU elem..
## DEL:ME:L1 Deletion of L1 element
## DUP Duplication
## DUP:TANDEM Tandem Duplication
## INS Insertion of novel s..
## INS:ME:ALU Insertion of ALU ele..
## INS:ME:L1 Insertion of L1 elem..
## INV Inversion
## CNV Copy number variable..
Should be fixed in devel branch on git, which has version 1.41.3
Fixed confirmed
Note the lack of quotes in the
Description
value. This means that a subsequentreadVcf
does not recognize this field as a string and discards theALT
information altogether. The cause is here:https://github.com/Bioconductor/VariantAnnotation/blob/54d2de6c7a1aa76f404076db79077cd05114edf2/R/methods-writeVcf.R#L181-L184
where there is a lack of special-casing for the
Description
column, which is present in the general case.