charite / jannovar

Annotation of VCF variants with functional impact and from databases (executable+library)
http://jannovar.readthedocs.io/en/master/
Other
58 stars 35 forks source link

Invalid aminoacid change detected due to GTF file phase ignored #541

Open znikasz opened 2 years ago

znikasz commented 2 years ago

Describe the bug Using hg38_ensembl.ser, version 95 jannovar annotates change

chr1 939278 C > T

in transcript ENST00000455979 as p.(R2C). This is incorrect, the correct change is A1V.

In my opinion Jannovar ignores the frame shift (phase) data in GTF file, which is 2 in the case of ENST00000455979:

1       havana  CDS     939275  939460  .       +       2       gene_id "ENSG00000187634"; gene_version "11"; transcript_id "ENST00000455979"; transcript_version "1"; exon_number "1"; gene_name "SAMD11"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "SAMD11-205"; transcript_source "havana"; transcript_biotype "protein_coding"; protein_id "ENSP00000412228"; protein_version "1"; tag "cds_start_NF"; tag "mRNA_start_NF"; transcript_support_level "2";

To Reproduce Steps to reproduce the behavior:

  1. Annotate chr1 939278 C > T and check it's protein change in ENST00000455979 .

Expected behavior Aminoacid change should be A1V

Additional context Please check if you don't ignore phase in CDS entries.

holtgrewe commented 2 years ago

Confirmed, phase is indeed ignored.

holtgrewe commented 2 years ago

We need to ask ENSEMBL of the meaning of frame information for CDS on reverse strand for proper implementation.