bihealth / auto-acmg

Automatic classification of sequence variants and CNVs according to ACMG criteria.
GNU General Public License v3.0
4 stars 0 forks source link

Finish `AutoBA1BS1BS2PS4PM2` #122

Closed gromdimon closed 2 months ago

gromdimon commented 3 months ago

Is your feature request related to a problem? Please describe. We've implemented #71 . Now we need to finish it

Describe the solution you'd like

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Here is some info for these criteria

PS4 (prevalence)

No automation has been implemented.

Original Definition

The prevalence of the variant in affected individuals is significantly increased compared to the prevalence in controls

Note 1: Relative risk (RR) or odds ratio (OR), as obtained from case-control studies, is >5.0 and the confidence interval around the estimate of RR or OR does not include 1.0. See manuscript for detailed guidance.

Note 2: In instances of very rare variants where case-control studies may not reach statistical significance, the prior observation of the variant in multiple unrelated patients with the same phenotype, and its absence in controls, may be used as moderate level of evidence.

-- Richards et al. (2015); Table 4

PM2

PM2_Supporting (absent from controls)

Original Definition

Absent from controls (or at extremely low frequency if recessive) in Exome Sequencing Project, 1000 Genomes or ExAC.

-- Richards et al. (2015); Table 4

Preconditions / Precomputations

Implemented Criterion

User Report

Literature

Caveats

BA1

BA1 (5% frequency)

Original Definition

Allele frequency is >5% in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium

-- Richards et al. (2015); Table 4

Preconditions / Precomputations

Implemented Criterion

User Report

Literature

Caveats

BS1

BS1 (expected frequency)

Original Definition

Allele frequency greater than expected for disorder.

-- Richards et al. (2015); Table 4

Preconditions / Precomputations

Implemented Criterion

User Report

Literature

BS2

BS2 (healthy adult)

Original Definition

Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder, with full penetrance expected at an early age.

-- Richards et al. (2015); Table 4

Preconditions / Precomputations

Implemented Criterion

User Report

Literature

Caveats

Notes

Intervar

BA1, BS1, BS2, PS4, and PM2 by Automated Scoring The AAFs in control populations are useful for scoring the pathogenicity of variants, given that frequently occurring variants in the population are unlikely to cause rare diseases. We retrieved information on disease prevalence from OrphaNet and translated OrphaNet identifiers into OMIM identifiers. Here, we used three datasets to assess the variant frequency: the NHLBI Exome Sequencing Project (ESP6500), 1000 Genomes Project, and ExAC Browser. If any of the AAFs in any database is >5%, BA1 will be assigned as 1. If the AAF in the ExAC Browser is great than expected for the disorder caused by mutations in the corresponding gene, BS1 will be assigned as 1 (here, we set a default cutoff as 1% for rare disease, but users can specify their own cutoff in the configuration file of InterVar). If a variant is observed in a healthy adult in the 1000 Genomes Project as a homozygote (for diseases defined as recessive in OMIM) or as a heterozygote otherwise, then BS2 will be applied. We manually removed known major adult-onset disorders from consideration. We did not use the ExAC Browser or ESP6500 here because these datasets can contain variants from individuals with various diseases. Variants that are absent or are present at extremely low frequencies in a large control cohort could represent moderate evidence for pathogenicity. If a variant that is responsible for dominant diseases is absent in all control subjects from ESP6500, 1000 Genomes Project, and the ExAC Browser, PM2 will be applied. If the variant causes recessive diseases and has a very low frequency with AAF < 0.5%, then PM2 can also be applied. Information on the gene-disease relationship, such as dominance or recessiveness, is obtained from OMIM. In some cases, pathogenic variants have a significantly higher frequency in affected subjects than in control subjects. To handle these variants, we also cataloged all variants with an odds ratio (OR) > 5.0 from GWASdb34 version 2. For these variants, PS4 will be applied. For some rare variants where case-control studies might not reach statistical significance, PS4 also can be downgraded to a moderate level during the manual adjustment step.

holtgrewe commented 2 months ago

zygosity of controls is nhomalt in gnomAD

https://reev.cubi.bihealth.org/internal/proxy/annonars/annos/variant?genome_release=grch37&chromosome=1&pos=58939502&reference=C&alternative=A