Closed d-cameron closed 5 years ago
Related question: why does an ALT
missing allele (.
) return an empty string and not NA_character
when the VCF file contains variants with symbolic alleles? The empty string looks very much like a 1bp deletion that doesn't quite conform to the VCF specifications. I would have thought it makes more sense to return NA in this situation.
FYI: my current workaround is this:
readVcf = function(file, ...) {
raw_vcf = VariantAnnotation::readVcf(file=file, ...)
# work-around for https://github.com/Bioconductor/VariantAnnotation/issues/8
alt = read_tsv(file, comment="#", col_names=c("CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT", seq_len(ncol(geno(raw_vcf)[[1]]))), cols_only(ALT=col_character()))$ALT
VariantAnnotation::fixed(raw_vcf)$ALT = CharacterList(lapply(as.character(alt), function(x) x))
return(raw_vcf)
}
Thanks @d-cameron for reporting this. The incorrect parsing of alleles starting with dot '.' was fixed in this commit.
The change is in devel for now and if all looks good I'll make the same change to release.
I've opened a new issue for the problem in https://github.com/Bioconductor/VariantAnnotation/issues/8#issuecomment-392411301.
@d-cameron have you had a chance to test this fix?
This change has been ported to release, package version 1.28.9.
Breakend parsing works if the
ALT
is something likeACT.
but fails when it points the other way and theALT
is something like.ACT
. I've tracked the issue down to vcftype.c:224:This comparison only check that the first character matches
.
- it doesn't also check that there are no other characters in the string. I'd put in a PR to replace the above code with astrcmp()
check with"."
or even a simple&& '\0' != *(field + 1)
but since I'm unfamiliar with the code base I'm concerned that doing so might have unintentional side effects.Could you have a look into making the change when you have some time?
Thanks!