Bioconductor / VariantAnnotation

Annotation of Genetic Variants
Convert `Number=A` to `Number=1` when creating an `ExpandedVCF` #79

LTLA commented 6 months ago

It seems reasonable that per-allele ##INFO fields should be converted from Number=A to Number=1 when expand() is called, given that the 1:many relationship between rows and allelic values is now flattened into a 1:1 mapping.

This has practical consequences, too, as we can see:

fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")

out <- tempfile()
first <- readVcf(fl)
first <- expand(first)

writeVcf(first, out)
roundtrip <- readVcf(out, row.names=FALSE)
roundtrip <- expand(roundtrip)

all.equal(first, roundtrip)
## [1] "Attributes: < Component “assays”: Attributes: < Component “data”: Attributes: < Component “listData”: Component “GT”: Attributes: < Length mismatch: comparison on first 1 components > > > >"
## [2] "Attributes: < Component “assays”: Attributes: < Component “data”: Attributes: < Component “listData”: Component “GQ”: Attributes: < Length mismatch: comparison on first 1 components > > > >"
## [3] "Attributes: < Component “assays”: Attributes: < Component “data”: Attributes: < Component “listData”: Component “DP”: Attributes: < Length mismatch: comparison on first 1 components > > > >"
## [4] "Attributes: < Component “assays”: Attributes: < Component “data”: Attributes: < Component “listData”: Component “HQ”: Attributes: < Length mismatch: comparison on first 1 components > > > >"
## [5] "Attributes: < Component “info”: Attributes: < Component “listData”: Component “AF”: Modes: numeric, S4 > >"
## [6] "Attributes: < Component “info”: Attributes: < Component “listData”: Component “AF”: Attributes: < target is NULL, current is list > > >"
## [7] "Attributes: < Component “info”: Attributes: < Component “listData”: Component “AF”: target is numeric, current is CompressedNumericList > >"
## [8] "Attributes: < Component “metadata”: Component “header”: Attributes: < Component “header”: Attributes: < Component “listData”: Component “fileDate”: Attributes: < Component “listData”: Component “Value”: 1 string mismatch > > > >"

We can ignore mismatches 1-4, as these are inconsequential to most end-users (albeit annoying to developers, see #78). We can also ignore mismatch 8, which is discussed in #78. The interesting discrepancies are that of 5-7, where AF becomes a NumericList after a roundtrip through the VCF file. This is because its ##info is still registering Number=A but should really be Number=1 to match the fact that it's already been flattened by the expand() call to generate first.

vjcitn commented 6 months ago

@hpages are you familiar with expand()?

vjcitn commented 6 months ago

I have some changes that address this, am testing them now.

vjcitn commented 6 months ago

Addressed in devel branch with e3103a

vjcitn commented 6 months ago

also in github ... presumably this change should be ported back to RELEASE_3_18? @mtmorgan ?

mtmorgan commented 6 months ago

personally I'd leave it in devel to provide a chance for any ramifications to materialize... I think this is the commit