Closed sambaxter closed 1 year ago
I thought this was done in #48. But the way it's implemented, I think it's only filtering populations in v3 (I don't think v2 has the freq_sample_count
global). Also, it's using 1000 as the threshold instead of 2000.
The recommendation is 1,000 individuals or 2,000 alleles - is the sample count individual or allele?
I can see Japanese in the latest list
We can do a hard filter on Japanese for v2 if that easiest. For v4 they should have the freq_sample_count if its in v3.
Oh, I forgot about individuals vs alleles.
Also, how should it work with exomes/genomes? 2000 alleles in exomes + genomes combined? Or filter each of exomes/genomes individually?
That is a very good question and I hadn't thought about it. I think we should start with 2,000 alleles in exomes + genomes combined for now. And this will be at the population level right (i.e. a population has more than 2,000 alleles in total in that particular version to be listed in the estimates)? I was working on the FAQ and it made me think whether we need to have a flag when a variant has an AN of less than 2,000 alleles in any population due to coverage or genomes only. I don't want it to get cluttered with too many flags but I do see a utility for this one.
I think I may have been going about this the wrong way. I removed populations with less than 1000 individuals in gnomAD. But do we want to do this filter at the variant level? Exclude variants from calculations for a population if the variant has an allele number less than 2000 in that population?
You were doing it right. We should remove populations with less than 1000 individuals in gnomAD. I am wondering if we should also have a flag at the variant level if a variant has an allele number less than 2000 in a particular population. Typically these are genomes only, which we already have a flag for, but sometimes it can be just a variant that is in a low coverage region. But if that would be too messy I don't think it's critical.
On Mon, Sep 5, 2022 at 7:16 PM Nick Watts @.***> wrote:
I think I may have been going about this the wrong way. I removed populations with less than 1000 individuals in gnomAD. But do we want to do this filter at the variant level? Exclude variants from calculations for a population if the variant has an allele number less than 2000 in that population?
— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/aggregate-frequency-calculator/issues/118#issuecomment-1237518904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AESSQ2GHFNI2UWJVKWGKNWDV4Z5NFANCNFSM57BPINUA . You are receiving this because you authored the thread.Message ID: <broadinstitute/aggregate-frequency-calculator/issues/118/1237518904@ github.com>
--
Samantha Baxter, MS, CGC
Associate Director, Genetic and Genomic Data Sharing
Licensed Genetic Counselor
@. @.>*
Ok, in that case it sounds like I need to add in the sample count information for gnomAD v2 (the v2 Hail Table doesn't have the freq_sample_count
global like v3 does).
And update how the populations are collected when there's both exomes and genomes. Currently, they are filtered independently.
If only v3 has it we could do this just for v3 and v4 (and skip v2). Would that be easier?
On Wed, Sep 7, 2022 at 11:39 AM Nick Watts @.***> wrote:
Ok, in that case it sounds like I need to add in the sample count information for gnomAD v2 (the v2 Hail Table doesn't have the freq_sample_count global like v3 does).
And update how the populations are collected when there's both exomes and genomes. Currently, they are filtered independently.
— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/aggregate-frequency-calculator/issues/118#issuecomment-1239560157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AESSQ2A4LP5ANBDNHLGP23LV5CZMJANCNFSM57BPINUA . You are receiving this because you authored the thread.Message ID: <broadinstitute/aggregate-frequency-calculator/issues/118/1239560157@ github.com>
--
Samantha Baxter, MS, CGC
Associate Director, Genetic and Genomic Data Sharing
Licensed Genetic Counselor
@. @.>*
And update how the populations are collected when there's both exomes and genomes. Currently, they are filtered independently.
I think this part will have to be done anyway for v4 (it has both exomes and genomes, right?)
And adding in sample counts for v2 shouldn't be difficult.
Ok great. Thank you!
On Wed, Sep 7, 2022 at 11:57 AM Nick Watts @.***> wrote:
And update how the populations are collected when there's both exomes and genomes. Currently, they are filtered independently.
I think this part will have to be done anyway for v4 (it has both exomes and genomes, right?)
And adding in sample counts for v2 shouldn't be difficult.
— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/aggregate-frequency-calculator/issues/118#issuecomment-1239583112, or unsubscribe https://github.com/notifications/unsubscribe-auth/AESSQ2FDV7X3OKKO2B5PF33V5C3M3ANCNFSM57BPINUA . You are receiving this because you authored the thread.Message ID: <broadinstitute/aggregate-frequency-calculator/issues/118/1239583112@ github.com>
--
Samantha Baxter, MS, CGC
Associate Director, Genetic and Genomic Data Sharing
Licensed Genetic Counselor
@. @.>*
We should only include subpopulations if they have more than 2,000 alleles. For v2 this is only japanese but in v4 this will likely be more populations (and japanese will likely have more than 2,000)