broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

Strange output by Funcotator #8965

Open gevro opened 2 weeks ago

gevro commented 2 weeks ago

Hi, With gatk 4.6.0 and Funcotator data sources v1.8, and output in VCF format, I'm seeing some annotations with strange character combinations inside of them: "%7C" "%20"

For example for one varaint chr11:54942730 C>T (hg38), for gnomAD_genomeAF, I'm seeing: 8.55286e-05%7C_3.46021e-04

But this should simply be one number.

Seems like a bug in the parsing of the retrieval of gnomAD info from the google cloud bucket by Funcotator.

gevro commented 2 weeks ago

As a work-around, how do I fully localize the gnomAD data sources?

gevro commented 2 weeks ago

Update: I localized gnomAD and got the same result. So I looked at the gnomAD annotation files and found the issue - there is a bug in how the gnomad_genome annotation files were prepared. Some variants appear twice, which is causing Funcotator to output two allele frequency annotations for each variant:

Example: chr11 54942730 . C T 979.63 PASS AF=8.55286e-05;AF_afr=0;AF_afr_female=0;AF_afr_male=0;AF_amr=0;AF_amr_female=0;AF_amr_male=0;AF_asj=0;AF_asj_female=0;AF_asj_male=0;AF_eas=0.00149477;AF_eas_female=0;AF_eas_male=0.00225734;AF_female=0;AF_fin=0;AF_fin_female=0;AF_fin_male=0;AF_male=0.000154131;AF_nfe=0;AF_nfe_est=0;AF_nfe_female=0;AF_nfe_male=0;AF_nfe_nwe=0;AF_nfe_onf=0;AF_nfe_seu=0;AF_oth=0;AF_oth_female=0;AF_oth_male=0;AF_popmax=0.00149477;AF_raw=0.000164376;OriginalContig=11;OriginalStart=51175001;ReverseComplementedAlleles chr11 54942730 rs1267687142 C T 1483.06 PASS AF=0.000346021;AF_afr=0;AF_afr_female=0;AF_afr_male=0;AF_amr=0;AF_amr_female=0;AF_amr_male=0;AF_asj=0;AF_asj_female=0;AF_asj_male=0;AF_eas=0.00980392;AF_eas_female=0.00694444;AF_eas_male=0.0113636;AF_female=0.000160256;AF_fin=0;AF_fin_female=0;AF_fin_male=0;AF_male=0.000487211;AF_nfe=0;AF_nfe_est=0;AF_nfe_female=0;AF_nfe_male=0;AF_nfe_nwe=0;AF_nfe_onf=0;AF_nfe_seu=0;AF_oth=0.00221239;AF_oth_female=0;AF_oth_male=0.00438596;AF_popmax=0.00980392;AF_raw=0.000561325;OriginalContig=11;OriginalStart=54710206

This bug would affect every pipeline that uses gnomad genome Funcotator for filtering.