Open gevro opened 2 weeks ago
As a work-around, how do I fully localize the gnomAD data sources?
Update: I localized gnomAD and got the same result. So I looked at the gnomAD annotation files and found the issue - there is a bug in how the gnomad_genome annotation files were prepared. Some variants appear twice, which is causing Funcotator to output two allele frequency annotations for each variant:
Example: chr11 54942730 . C T 979.63 PASS AF=8.55286e-05;AF_afr=0;AF_afr_female=0;AF_afr_male=0;AF_amr=0;AF_amr_female=0;AF_amr_male=0;AF_asj=0;AF_asj_female=0;AF_asj_male=0;AF_eas=0.00149477;AF_eas_female=0;AF_eas_male=0.00225734;AF_female=0;AF_fin=0;AF_fin_female=0;AF_fin_male=0;AF_male=0.000154131;AF_nfe=0;AF_nfe_est=0;AF_nfe_female=0;AF_nfe_male=0;AF_nfe_nwe=0;AF_nfe_onf=0;AF_nfe_seu=0;AF_oth=0;AF_oth_female=0;AF_oth_male=0;AF_popmax=0.00149477;AF_raw=0.000164376;OriginalContig=11;OriginalStart=51175001;ReverseComplementedAlleles chr11 54942730 rs1267687142 C T 1483.06 PASS AF=0.000346021;AF_afr=0;AF_afr_female=0;AF_afr_male=0;AF_amr=0;AF_amr_female=0;AF_amr_male=0;AF_asj=0;AF_asj_female=0;AF_asj_male=0;AF_eas=0.00980392;AF_eas_female=0.00694444;AF_eas_male=0.0113636;AF_female=0.000160256;AF_fin=0;AF_fin_female=0;AF_fin_male=0;AF_male=0.000487211;AF_nfe=0;AF_nfe_est=0;AF_nfe_female=0;AF_nfe_male=0;AF_nfe_nwe=0;AF_nfe_onf=0;AF_nfe_seu=0;AF_oth=0.00221239;AF_oth_female=0;AF_oth_male=0.00438596;AF_popmax=0.00980392;AF_raw=0.000561325;OriginalContig=11;OriginalStart=54710206
This bug would affect every pipeline that uses gnomad genome Funcotator for filtering.
Hi, With gatk 4.6.0 and Funcotator data sources v1.8, and output in VCF format, I'm seeing some annotations with strange character combinations inside of them: "%7C" "%20"
For example for one varaint chr11:54942730 C>T (hg38), for gnomAD_genomeAF, I'm seeing: 8.55286e-05%7C_3.46021e-04
But this should simply be one number.
Seems like a bug in the parsing of the retrieval of gnomAD info from the google cloud bucket by Funcotator.