KarchinLab / open-cravat

A modular annotation tool for genomic variants
MIT License
113 stars 27 forks source link

Not main transcript selected by default? #130

Open EugeneEA opened 1 year ago

EugeneEA commented 1 year ago

Hi, I annotated rsID rs121964985 (hg38, chr3:49417892) and cravat outputed following gene and transcript: AC104452.1 ENST00000636166.1

whereas in all_mapping filed it outputed

{"AC104452.1": [["", "p.Arg399His", "missense_variant", "ENST00000636166.1", "c.1196G>A"], ["", "", "NMD_transcript_variant,3_prime_UTR_variant", "ENST00000638079.1", "c.1471G>A"], ["", "", "NMD_transcript_variant,3_prime_UTR_variant", "ENST00000638115.1", "c.2720G>A"]],

"AMT": [["P48728", "p.Arg320His", "missense_variant", "ENST00000273588.9", "c.959G>A"], ["P48728", "p.Arg320His", "missense_variant", "ENST00000395338.7", "c.959G>A"], ["", "p.Arg272His", "missense_variant", "ENST00000427987.6", "c.815G>A"], ["", "", "2kb_downstream_variant,NMD_transcript_variant", "ENST00000430521.2", "c.1758G>A"], ["P48728", "p.Arg276His", "missense_variant", "ENST00000458307.6", "c.827G>A"], ["", "", "2kb_downstream_variant,processed_transcript", "ENST00000487589.6", ""], ["", "p.Arg272His", "missense_variant", "ENST00000538581.6", "c.815G>A"], ["", "p.Arg293His", "missense_variant", "ENST00000635808.1", "c.878G>A"], ["", "", "NMD_transcript_variant,3_prime_UTR_variant", "ENST00000636023.1", "c.132G>A"], ["", "", "NMD_transcript_variant,3_prime_UTR_variant", "ENST00000636070.1", "c.*739G>A"], ["", "p.Arg174His", "missense_variant", "ENST00000636199.1", "c.521G>A"],

["P48728", "p.Arg264His", "missense_variant", "ENST00000636522.1", "c.791G>A"], ["", "", "intron_variant", "ENST00000636597.1", "c.551-174G>A"], ["", "p.Arg268His", "missense_variant", "ENST00000636865.1", "c.803G>A"], ["", "", "intron_variant", "ENST00000637682.1", "c.878-174G>A"], ["", "", "intron_variant,NMD_transcript_variant", "ENST00000637821.1", "c.*1228+41G>A"], ["", "p.Arg293His", "missense_variant", "ENST00000638063.1", "c.878G>A"]],

"TCTA": [["P57738", "", "2kb_downstream_variant", "ENST00000273590.3", "c.*3030C>T"]]}

As far as I can understand the only MAIN transcript here is ENST00000273588 and the therefore the gene AMT and ENST00000273588 have to be prioritized in the current situation.

Why the output differ here?

Best, Eugene

mlarsen2 commented 1 year ago

Hi Eugene,

We just made an update to the hg38 mapper module to further prioritize the MANE transcript. If you update the module to version 1.10.3, you should see the correct transcript selected.

EugeneEA commented 1 year ago

Hi, in the latest version it does work indeed! thanks!

EugeneEA commented 1 year ago

Hi,

one more question concerning MAIN the point is rs139416487 cravat returns as a primary transcript ENST00000240361.12 and all_mapping field:

{"TEX14": [["Q8IWB6", "p.Leu503Pro", "missense_variant", "ENST00000240361.12", "c.1508T>C"], ["Q8IWB6", "p.Leu497Pro", "missense_variant", "ENST00000349033.9", "c.1490T>C"], ["Q8IWB6", "p.Leu497Pro", "missense_variant", "ENST00000389934.7", "c.1490T>C"], ["", "", "NMD_transcript_variant,3_prime_UTR_variant", "ENST00000582740.1", "c.*1328T>C"]]}

But the MAIN here is ENST00000349033.9

Is it a bug or is there some logic behind it?

Best, Eugene