broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 590 forks source link

Bug Funcotator: wrong canonical #8714

Open zhanyinx opened 8 months ago

zhanyinx commented 8 months ago

Bug Report

Affected tool(s) or class(es)

Funcotator

Affected version(s)

4.5.0.0

Description

Funcotator does not select the correct transcript. It does not select the Canonical trascript

Steps to reproduce

Command line used:

gatk Funcotator -L null -R path2/hg19.fasta -V file.vcf.gz -O GERS014SDZ-1N.maf --annotation-default Matched_Norm_Sample_Barcode:GERS014SDZ-1N --remove-filtered-variants true --output-file-format MAF --data-sources-path path2/hg19/funcotator_dataSources.v1.8.2024g --ref-version hg19 --splice-site-window-size 5 --interval-padding 2

Expected behavior

It should annotate the variant using the canonical transcript

Actual behavior

It uses a transcript which is not the Ensembl_canonical (MANE_select)

More details

I am using the gencode v43 from the funcotator last datasource. In the gencode.v43lift37.annotation.REORDERED.gtf, I see that the canonical transcript from the gene MUTYH is ENST00000456914.7_10. However, in the output maf file, I got ENST00000531105.5_4 transcript.

I attach the input vcf and the output maf file

file.vcf.gz GERS014SDZ-1N.maf.zip

Thanks for your help Best Zhan

gokalpcelik commented 7 months ago

Hi @zhanyinx Funcotator's canonical transcript designation is different than what is presented by other tools therefore if you wish to force annotate a variant with a certain transcript you need to provide the names of those transcripts with the parameter below

--transcript-list <String>    File to use as a list of transcripts (one transcript ID per line, version numbers are
                              ignored) OR A set of transcript IDs to use for annotation to override selected transcript.
                              This argument may be specified 0 or more times. Default value: null. 

Funcotator's method for selecting transcripts is summarized below

BEST_EFFECT

Select a transcript to be reported with details with priority on effect according to the folowing list of selection criteria:

Choose the transcript that is on the custom list specified by the user. If no list was specified, treat as if no transcripts were on the list (tie).
In case of tie, choose the transcript that yields the variant classification highest on the variant classification rank list (see below).
If still a tie, choose the transcript with highest level of curation. Note that this means lower number is better for level (see below).
If still a tie, choose the transcript with the best appris annotation (see below).
If still a tie, choose the transcript with the longest transcript sequence length.
If still a tie, choose the first transcript, alphabetically.

CANONICAL

Select a transcript to be reported with details with priority on canonical order according to the folowing list of selection criteria:

Choose the transcript that is on the custom list specified by the user. If no list was specified, treat as if all transcripts were on the list (tie).
In case of tie, choose the transcript with highest level of curation. Note that this means lower number is better for level (see below).
If still a tie, choose the transcript that yields the variant classification highest on the variant classification rank list (see below).
If still a tie, choose the transcript with the best appris annotation (see below).
If still a tie, choose the transcript with the longest transcript sequence length.
If still a tie, choose the first transcript, alphabetically.

ALL

Same as CANONICAL, but indicates that no transcripts should be dropped. Render all overlapping transcripts.