broadinstitute / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
176 stars 88 forks source link

SV pathogenicity predictors #2730

Open jxchong opened 2 years ago

jxchong commented 2 years ago

Here are 4 SV pathogenicity predictors that we decided would be useful:

  1. AnnotSV - https://lbgi.fr/AnnotSV/ - annotates SVs with presence in DGV, DD, etc. Pathogenicity ranking useful for making sure you didn't miss an obvious pathogenic SV. Ranks SVs by whether they overlap with pathogenic SVs, haploinsufficient genes, known OMIM genes OR whether the SV overlaps significantly with a known benign SV
  2. CADD-SV - https://cadd-sv.bihealth.org/ Model incorporates many annotations including SNV scores as well as constraint, conservation, epigenetic/regulatory information, and gene overlap to score SVs. Trains on chimp and human SVs, which is different than other predictors.
  3. TADA - https://github.com/jakob-he/TADA/ Focused on incorporating TAD-centric annotations to identify those that might be pathogenic due to effects of genome structure, however there's little relevant training data, so their score is mostly driven by coding annotations.
  4. StrVCTVRE (you already have implemented)

There is effectively 0 overlap between the pathogenicity predictions by each of these tools when we did our own analysis using recently-reported SVs. We think given how little is really known about pathogenic SVs, it's worth implementing all of these predictors and looking at SVs that are flagged by at least one tool (or at least 2 tools, etc).

jxchong commented 2 years ago

fyi @wharvey31 has gotten all of these running for us outside of seqr but not sure how much overlap there will be with what you need