on a few hundred variants, I saw a few dibase variants that affected two amino acids, where one of them turned into stop, but CONSEQUENCE (incorrectly I think) classified it as a nonsynomous variant:
While DBS that stayed in one codon were (correctly) annotated as nonsense.
> coding[coding$CONSEQUENCE=='nonsense',][1,]
GRanges object with 1 range and 16 metadata columns:
seqnames ranges strand | REF ALT
<Rle> <IRanges> <Rle> | <DNAStringSet> <CharacterList>
[1] 3 164739107-164739108 - | AT TA
QUAL FILTER varAllele CDSLOC PROTEINLOC QUERYID
<numeric> <character> <DNAStringSet> <IRanges> <IntegerList> <integer>
[1] 40 PASS TA 3163-3164 1055 69
TXID CDSID GENEID CONSEQUENCE REFCODON
<character> <IntegerList> <character> <factor> <DNAStringSet>
[1] 16983 52835 6476 nonsense ATA
VARCODON REFAA VARAA
<DNAStringSet> <AAStringSet> <AAStringSet>
[1] TAA I *
-------
seqinfo: 93 sequences from an unspecified genome
Seems it's checking if VARAA is identical to "*", while it really should check if it contains "*", as that is enough to truncate the protein.
Line 184 of methods-predictCoding.R
Hi! VariantAnnotation is a great package and I enjoy depending on it in my own package, thanks for that!
I'm looking at adding support for dibase substitutions, and seems there might be a little bug in the annotation there.
From
on a few hundred variants, I saw a few dibase variants that affected two amino acids, where one of them turned into stop, but CONSEQUENCE (incorrectly I think) classified it as a nonsynomous variant:
While DBS that stayed in one codon were (correctly) annotated as nonsense.
Seems it's checking if VARAA is identical to "*", while it really should check if it contains "*", as that is enough to truncate the protein. Line 184 of methods-predictCoding.R
Possibly should be something along the lines of
I'm adding a quick fix for it in my code
so no hurry for me.