Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
26 stars 20 forks source link

Left-pointing breakend ALT variants get truncated because the allele starts with '.' #8

Closed d-cameron closed 5 years ago

d-cameron commented 6 years ago

Breakend parsing works if the ALT is something like ACT. but fails when it points the other way and the ALT is something like .ACT. I've tracked the issue down to vcftype.c:224:

vcftype->u.character[idx] =
            ('.' == *field) ? vcftype->charDotAs : field;

This comparison only check that the first character matches . - it doesn't also check that there are no other characters in the string. I'd put in a PR to replace the above code with a strcmp() check with "." or even a simple && '\0' != *(field + 1) but since I'm unfamiliar with the code base I'm concerned that doing so might have unintentional side effects.

Could you have a look into making the change when you have some time?

Thanks!

d-cameron commented 6 years ago

Related question: why does an ALT missing allele (.) return an empty string and not NA_character when the VCF file contains variants with symbolic alleles? The empty string looks very much like a 1bp deletion that doesn't quite conform to the VCF specifications. I would have thought it makes more sense to return NA in this situation.

d-cameron commented 5 years ago

FYI: my current workaround is this:

readVcf = function(file, ...) {
  raw_vcf = VariantAnnotation::readVcf(file=file, ...)
  # work-around for https://github.com/Bioconductor/VariantAnnotation/issues/8
  alt = read_tsv(file, comment="#", col_names=c("CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT", seq_len(ncol(geno(raw_vcf)[[1]]))), cols_only(ALT=col_character()))$ALT
  VariantAnnotation::fixed(raw_vcf)$ALT = CharacterList(lapply(as.character(alt), function(x) x))
  return(raw_vcf)
}
vobencha commented 5 years ago

Thanks @d-cameron for reporting this. The incorrect parsing of alleles starting with dot '.' was fixed in this commit.

The change is in devel for now and if all looks good I'll make the same change to release.

I've opened a new issue for the problem in https://github.com/Bioconductor/VariantAnnotation/issues/8#issuecomment-392411301.

vobencha commented 5 years ago

@d-cameron have you had a chance to test this fix?

vobencha commented 5 years ago

This change has been ported to release, package version 1.28.9.