ga4gh / va-spec

An information model for representing variant annotations.
15 stars 3 forks source link

Extend Population Frequency Annotation to cover MAF/FAF? #42

Open mbrush opened 5 years ago

mbrush commented 5 years ago

Amanda Spurdle raised issue of if/how we plan to capture MAF/FAF as part of Population Frequency annotations, and/or as a separate type of annotation.

MAF = minor allele frequency (frequency of the second most common allele at a given locus) FAF = Founder Allele Frequency (is this same as ancestral allele?)

Example: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?do_not_redirect&rs=rs429358

Questions:

  1. Is indicated what allele is the minor allele and/or the founder allele in a given population in scope for our initial round of VA work?
  2. If yes, should we extend the PopFreq annotation model to include this info, and how?
mbrush commented 5 years ago

If we do decide to try and include this info within the PopFreq annotation, the simplest way might be to include boolean attributes in our model that indicate if the subject allele of a given PopFreq annotation is the minor allele or the founder allele in the specific population the annotation is about. This simple addition should support the most pressing use cases of being able to identify the minor or founder allele is at a given location in a given population, and determine what the frequency of this allele is in that population.

With this approach, an Allele Population Frequency annotation model might look something like this (simplified for clarity and focus on the issue at hand):


I suspect this may be adequate for indicting an allele is the minor allele in a population, as the provenance of this is directly tied to/shared with the core frequency calculations. But I wonder if an assertion of 'founder-ness' may be more complex, as it has different evidence/provenance than the rest of the PopFreq Study Data. Would we be comfortable lumping that into the PopFreq annotation as well? Or, if we want to capture this info would we want to define a separate VA type? If not, how define the scope and semantics of this new VA type?

dsonkin commented 5 years ago

Knowing that allele is a founder allele is very useful to know. However boolean attribute isFounderAllele is not sufficient to capture information on founder allele, because information for population in which this particular allele is a founder one is also required. For example, BRCA1 E23fs (CA003783) is a founder mutation in the Ashkenazi Jewish population and if study is just in this particular population boolean attribute isFounderAllele would be sufficient, however if population frequency reported in study is based on population from many different backgrounds boolean attribute would not be sufficient.

mbrush commented 5 years ago

Thanks for this insight Dmitriy - I will clarify my thinking on this below, and be interested in feedback.

We have defined/scoped PopFreq statements to report data from a clearly specified population that is captured in the populationQualifier slot of the model. The scope here is most often a single race/ethnicity-based population such as 'AFR', or 'NFE' - in which case would think a boolean attribute is sufficient for indicating a founder allele. This seems the most efficient way to do this - where the intent is to indicate that the subject allele of the PF statement is the founder in the indicated/qualifying population. This would seem to address your point above that the notion of a founder is tied to a specific population, no?

In cases where the interrogated population is not a specific race/ethnic group (ie the study/statement covers frequency across all ethnic populations in gnomAD), then it may be that the notion of a founder allele is not relevant, and this boolean attribute should not be populated.

My hope is that this boolean attribute can work as a concise and convenient approach to capturing minor allele and founder allele information in PF statements, where they provide important context for interpretation and use. But if we decide founder allele identification is inherently important to warrant its own VA type, or if there is a need to assert that a variant is a founder allele in a population in the absence of frequency data, or if we decide we need to track the evidence/provenance behind founder allele statements in a precise way, then we will need to create a new VA Statement type here.

dsonkin commented 5 years ago

If study covers frequency across many different ethnic populations and boolean attribute isFounderAllele is not set, person retrieving such data may not realize that this is founder allele.

mbrush commented 5 years ago

@dsonkin - To be clear, your earlier point was that it wouldn't make sense to talk about a founder allele in our scenario above, right? Here the population includes many different ethnic groups, so you would not expect that an isFounderAllele boolean attribute would be set.

If a data creator wanted to indicate an allele was a founder, they would create PF statements for particular ethnicity-based subpopulations, and set the founderAllele boolean to true where relevant in this context. And this is where a user would have to look to find founder allele info.

mbrush commented 5 years ago

@dsonkin pointed out on the June 26 call the possible requirement that a consumer of an annotation about a more general population (e.g. all of gnomAD) may want to know what the minor allele or founder allele is in different subpopulations (e.g. AFR, FIN, etc).

If this is a core requirement, how should it be handled by the model? Consider what to do if the dataset does not include PopFreq statements for the different subpopulations, where this info might be captured based on the proposals above.

AmandaSpurdle commented 5 years ago

Amanda Spurdle raised issue of if/how we plan to capture MAF/FAF as part of Population Frequency annotations, and/or as a separate type of annotation.

MAF = minor allele frequency (frequency of the second most common allele at a given locus) FAF = Founder Allele Frequency (is this same as ancestral allele?)

Example: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?do_not_redirect&rs=rs429358

Questions:

  1. Is indicating what allele is the minor allele and/or the founder allele in a given population in scope for our initial round of VA work? The first comment is that FAF is not founder allele frequency! It is FILTER allele frequency – this is a relatively new concept introduced to capture the variability around frequency estimations derived from sample sets of different size, when being used to assess – for clinical variant interpretation – whether allele frequency is higher than expected for the disorder - ACMG criteria BA1 and BS1. This information is pre-calculated for gnomad alleles now, but could of course be calculated for any dataset. The FAF gives the lower 95% CI of an allele frequency - which is then separately reviewed against what the user defines as the maximum credible allele frequency to be used for BA1 (or BS1).

I think sufficient to define the alleles, and then the frequency of at least one of them – generally the allele that is NOT on the designated reference transcript. Using the term minor and major becomes an issue when dealing with common variants for which the “minor” allele might swap from one pop to another. So reference and alternate allele is probably best, and then state alternate allele frequency. How are you going to deal with issues where there is more than one allelic change at a position? In terms of FILTER allele frequency… it really depends on how the data is to be used…see below

  1. If yes, should we extend the PopFreq annotation model to include this info, and how? There are 2 possibilities I see here – annotating each variant identified for MAF or FAF to aid in interpretation, OR using the dataset to define a FAF for that dataset… to add to what might be present in gnomad to aid in variant prioritisation/interpretation - in which case the computation would need to be done as per the Gnomad example for that dataset.

I see some comments from Dimitry, and I am not really following his argument of the utility of founder alleles, so I think I use the term differently. For me, the definition of a founder allele theoretically requires seeing if the ancestral haplotype is the same for all carriers of a given variant. And one might ask how meaningful that is – possible only if we believe that there are alleles in cis that modify risk in one population vs another? Practically speaking knowing that a specific variant is common in on pop vs another used to be interesting for understanding population bottlenecks, and practically useful in terms of defining “cheap” screening strategies for individuals from a given ethnic group - but with mod tech we are way past considering screening for “founder” variants now… so although occasionally u might see use for the information to provide and understanding of genotype-phenotype relationships, for the most part…. But happy to stand corrected if someone/Dmitry can explain more

AmandaSpurdle commented 5 years ago

reponse to maf/faf questions from matt brush

dsonkin commented 5 years ago

I was commenting on founder alleles from prospective of genotype-phenotype relationships.

AmandaSpurdle commented 5 years ago

Sorry to be bovine dimitry, but I am still not getting this – are you talk about an allele being common enough that it provides a cleaner way to assess geno-pheno correlations?

From: Dmitriy Sonkin [mailto:notifications@github.com] Sent: Wednesday, 10 July 2019 6:25 AM To: ga4gh-gks/variant-annotation-model Cc: Amanda Spurdle; Comment Subject: Re: [ga4gh-gks/variant-annotation-model] Extend Population Frequency Annotation to cover MAF/FAF? (#42)

I was commenting on founder alleles from prospective of genotype-phenotype relationships.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ga4gh-gks/variant-annotation-model/issues/42?email_source=notifications&email_token=ACZCEIGGX5HHS5G4OUWV5VDP6TX2ZA5CNFSM4HXDBQAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZRNNLI#issuecomment-509793965, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACZCEIBABOX7V4YOCI7BWNDP6TX2ZANCNFSM4HXDBQAA.

dsonkin commented 5 years ago

Classical example would be BRCA1/2 founder mutations.