broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 590 forks source link

Funcotator feature request: custom prioritization order for mutation types #7631

Open droazen opened 2 years ago

droazen commented 2 years ago

Request from a user in the Getz lab: allow the prioritization order for mutation types to be customized. Eg., instead of the default:

De novo start out of frame/nonsense/nonstop
Missense/De novo start in frame/in frame deletion/in frame insertion
Frameshift deletion/frameshift insertion/frameshift substitution
Start codon SNP/start codon deletion/start codon insertion/start codon DNP/start codon TNP/start codon ONP/stop codon SNP/stop codon deletion/stop codon insertion/stop codon DNP/stop codon TNP/stop codon ONP
Splice site/splice site SNP/splice site deletion/splice site insertion/splice site DNP/splice site TNP/splice site ONP/splice site miRNA
Silent
3' UTR/5' UTR
Intron
5' flank/3' flank
Non-coding transcript
IGR

the user wants to be able to set the order to something like:

Splice site/splice site SNP/splice site deletion/splice site insertion/splice site DNP/splice site TNP/splice site ONP/splice site miRNA
De novo start out of frame/nonsense/nonstop
Missense/De novo start in frame/in frame deletion/in frame insertion
Frameshift deletion/frameshift insertion/frameshift substitution
Start codon SNP/start codon deletion/start codon insertion/start codon DNP/start codon TNP/start codon ONP/stop codon SNP/stop codon deletion/stop codon insertion/stop codon DNP/stop codon TNP/stop codon ONP
Silent
3' UTR/5' UTR
Intron
5' flank/3' flank
Non-coding transcript
IGR
DarioS commented 2 years ago

In a similar vein, would it be feasible to allow sample-matched RNA-seq data to be specified as input, so that the annotation is based on the actual isoform(s) that is (are) transcribed in a particular sample? The same SNV may be annotated in two different ways in two different samples, if the isoform(s) inferred by RNA-seq data differ (e.g. exonic for Patient A, intronic for Patient B). It avoids subjective prioritisation lists like the ones above and is instead data-driven and contextual.