Closed theferrit32 closed 2 years ago
@larrybabb @KelseaChang5 to rename - more succinct?
Example hail record from gnomad with added annotations using current popmax code in this repo. The faf
records with "gnomad pop" comments are the relevant ones. Here they are all 0.0
. When the faf_index_dict
on the gnomad hail table is collected into a python list it happens to put them in alphabetic order based on label, yielding afr
.
Struct(
locus=Locus(contig=1, position=10114, reference_genome=GRCh37), alleles=['T', 'C'],
faf=[
Struct(meta=frozendict({'group': 'adj'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'nfe'}), faf95=0.0, faf99=0.0), # gnomad pop
Struct(meta=frozendict({'group': 'adj', 'pop': 'afr'}), faf95=0.0, faf99=0.0), # gnomad pop
Struct(meta=frozendict({'group': 'adj', 'pop': 'eas'}), faf95=0.0, faf99=0.0), # gnomad pop
Struct(meta=frozendict({'group': 'adj', 'pop': 'amr'}), faf95=0.0, faf99=0.0), # gnomad pop
Struct(meta=frozendict({'group': 'adj'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'nfe'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'afr'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'eas'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'amr'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'nfe'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'afr'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'eas'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'amr'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'nfe'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'afr'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'eas'}), faf95=0.0, faf99=0.0),
Struct(meta=frozendict({'group': 'adj', 'pop': 'amr'}), faf95=0.0, faf99=0.0)],
popmax_faf=Struct(meta=frozendict({'group': 'adj', 'pop': 'afr'}), faf95=0.0, faf99=0.0),
popmax_index_dict_key='gnomad_afr',
popmax_faf_pop_freq=Struct(AC=0, AF=0.0, AN=2168, homozygote_count=0))
@KelseaChang5 can you follow up with Steve Harrison on this to verify that he understands the concern/bug being raised by @theferrit32? If he wants @theferrit32 to make changes to resolve this going forward then please capture that change here and assign to @theferrit32 otherwise, document that Steven understands and accepts the risk related to the potential misrepresentation of data that may be occurring.
@KelseaChang5 can you follow up with Steve Harrison on this to verify that he understands the concern/bug being raised by @theferrit32? If he wants @theferrit32 to make changes to resolve this going forward then please capture that change here and assign to @theferrit32 otherwise, document that Steven understands and accepts the risk related to the potential misrepresentation of data that may be occurring.
Will do!
@larrybabb @KelseaChang5 I think I'll just go ahead and fix this now while it's at the front of my mind and the issue might cause confusion in the output data.
Sounds good @theferrit32. I still think it's wise to make sure Steven H is aware of the issue and how it may impact the output as compared to past reports.
This has been resolved as of bbd7af4418750a4a34fbbe83f906c61efa6b0b56. This also makes it so that output will include all subpopulations which had the maximum filtered allele frequency. There are some cases in gnomad where two populations have the same 95% confidence FAF and their value is the max.
The main changes for this were added in the add_popmax_af
function of hgvs-annotate-gnomad-clinvar.ipynb
At the end before export, the notebook just concatenates with commas. In most cases there is only one popmax.
Issue: When all FAF95 values are zero, FAF report always reports the popmax as the first gnomad superpopulation in the gnomad faf_index_dict
Result: This could result in nondeterministic popmax populations being reported, but in practice has resulted in
afr
being reported as the popmax for every variant with all 0 superpopulation FAF95s.gnomAD reports these cases as null popmax, which is more meaningful, and should be done here.
Example: https://gnomad.broadinstitute.org/variant/16-2185899-C-T?dataset=gnomad_r2_1