WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
230 stars 349 forks source link

How Annovar handles * in ALT fields #186

Open kvn95ss opened 2 years ago

kvn95ss commented 2 years ago

Hello,

According to this post by GATK, * in ALT field can indicate that the variant spans a deletion. When annotating such variants with annovar, I noticed that the * were interpreted as 0.

While annotating with Clinvar database, it wrongly identified the input and matched to different variant in the same loci - Our variant - chr19 15547993 15547994 GA 0

The Clinvar variant it matched to - rs71176433

Please note how our reported variant is a GA deletion, while the deletion in clinvar spans multiple nucleotides delAGGG(AG)3(TG)5

What could be the reason for this discrepancy and how to resolve it?

I can think of replacing the 0 in avinput to - which would make it in line with annovar convention, would that be worth a shot?

Thanks for your time!

kaichop commented 2 years ago

If there is a GA deletion, then the alternative allele should be "-" in ANNOVAR input format. Alternatively, you can always use VCF files, and use table_annovar (-vcfinput), if the VCF file has the correct alternative allele listed. I am very much against the whole concept of either using rs identifier to represent a mutation (which is incorrect), or use '*' to represent an unidentifiable deletion that may or may not be two bases, or to put multiallelic variant on the same line of VCF file. If you already know what the ALT is, you could manually put it in the VCF file before annotation.

On Fri, Apr 8, 2022 at 2:50 AM Karthik Nair @.***> wrote:

Hello,

According to this post by GATK https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele-,

  • in ALT field can indicate that the variant spans a deletion. When annotating such variants with annovar, I noticed that the * were interpreted as 0.

While annotating with Clinvar database, it wrongly identified the input and matched to different variant in the same loci - Our variant - chr19 15547993 15547994 GA 0

The Clinvar variant it matched to - rs71176433 https://www.ncbi.nlm.nih.gov/snp/rs71176433

Please note how our reported variant is a GA deletion, while the deletion in clinvar spans multiple nucleotides delAGGG(AG)3(TG)5

What could be the reason for this discrepancy and how to resolve it?

I can think of replacing the 0 in avinput to - which would make it in line with annovar convention, would that be worth a shot?

Thanks for your time!

— Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/186, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OCSNWRYPPL7FE5D2CLVD7JJLANCNFSM5S3SNAGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nurmians commented 1 year ago

Is there ever a reason to annotate spanning or overlapping deletions? (https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele-) I would prefer that Annovar would ignore these "*" alt values. I've run into an issue where REF/ALT col values "T/C,*" and "T/*,C" get annotated differently.

kaichop commented 1 year ago

I discussed this issue at https://annovar.openbioinformatics.org/en/latest/articles/VCF/ It is just how VCF is designed, it tries to serve both as a variant list, and as a genotype list, and even as a locus list, which results in various issues of identifiability. You can do allelic splitting first, then annotate VCF. Do not annotate a VCF without splitting because all sorts of edge cases can occur.

On Fri, Apr 14, 2023 at 6:42 AM Anssi Nurminen @.***> wrote:

Is there ever a reason to annotate spannig or overlapping regions? ( https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele- ) I would prefer that Annovar would ignore these "" alt values. I've run into an issue where REF/ALT col values "T/C," and "T/*,C" get annotated differently.

— Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/186#issuecomment-1508310581, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OG2O7LQAKNIIMYB7ILXBESYTANCNFSM5S3SNAGQ . You are receiving this because you commented.Message ID: @.***>

nurmians commented 1 year ago

Thanks for the reply. Maybe a good option would be to print out a warning or even stop processing if Annovar encounters these split ALT col values it can't handle. Right now everything seems to be processed smoothly but the results will have these issues that are hard to identify. I think stop processing and create a flag to override would me my preferred functionality.