genome-nexus / annotation-tools

Tools developed for AACR GENIE to allow annotation of vcf and maf files from a number of centers and merging the results
MIT License
6 stars 15 forks source link

Unknown annotation failures reasons #59

Closed Chelsea-Na closed 5 months ago

Chelsea-Na commented 11 months ago

Dear GN team,

We are trying to figure out why some mutation records are failing annotation by genome-nexus (version: 1.0.2). The error handling in the report for the failed annotations was very useful, but there were still a lot of records that have unknown failure reasons. (see unknown_annotation_fails.txt). This file contains n=2000 failed annotations that had a blank FAILURE_REASON value.

Would you be able to help us make sense of the reasons these failed annotations? Let me know if there is anything else I can provide to help troubleshoot this.

Thank you!

leexgh commented 11 months ago

I try to re-annotate the failing variants, and there are four types of cases:

  1. Successfully annotated variants 308 variants can be annotated successfully, please see the list here: success_variants.txt

  2. Missing variant allele Two variants failed annotation due to a missing variant allele. See list here: missing_var.txt

    SAGE-83   1    59249767    59249767    A    
    SAGE-87   17    41246040    41246040    T    

    It can be successfully annotated when I manually add - in the variant allele field. Good to review and update the variant allele information for these cases. In the meantime, we need to address the application's failure in handling variants with missing allele information (issue created here: https://github.com/genome-nexus/genome-nexus/issues/724). I'll fix this asap.

  3. Reference allele extracted from response does not match given reference allele For example:

    SAGE-932   1    115256529    115256529    A    T    Reference allele extracted from response (T) does not match given reference allele (A)       

    Please see the list here: ref_not_match.txt

  4. Incorrect format: missing reference allele and having "cDNA" in variant allele Some variants are missing reference alleles and containing "cDNA" description in the variant allele. For example:

    SAGE-3498-83   17    37884133    37884134        3605_3607delGAG       

    Please see the list here:incorrect_format.txt