broadinstitute / SpliceAI-lookup

Website for checking SpliceAI and Pangolin scores:
https://spliceailookup.broadinstitute.org
MIT License
18 stars 7 forks source link

warn about and/or provide more info for insertion variants that may be affected by issue #67 #69

Closed bw2 closed 3 months ago

bw2 commented 6 months ago

Issue #67 highlighted why SpliceAI and Pangolin scores should be treated with caution for any insertion variant where the inserted bases are at least partially the same as adjacent reference sequence - such as the chr2:47790924-C-CAGTTG insertion illustrated below:

GCAAC|-----|AGTTGTG  (REF)
GCAAC|AGTTG|AGTTGTG  (ALT)

Even though SpliceAI and Pangolin delta scores imply that this insertion creates a new splice acceptor 3bp ustream of the annotated exon junction, this is just a technical artifact of converting the model's predictions into delta scores. The underlying models in fact predict no difference in splicing between the REF and ALT haplotypes, and so the variant just inserts 5bp of additional sequence into the exon.

To mitigate this issue, we should 1) show a warning in the results table for insertions that have a high delta score and where the inserted bases match adjacent reference sequence 2) allow users to see the base-level predictions for inserted bases. The IGV.js visualization currently only shows predicted REF and ALT scores for positions in the reference genome. It's not possible to modify it to also show scores for individual inserted bases. Instead, show a new table with one row per position like below, or create a new static visualization.

POS                  REF         ALT       REF ACC   REF DONOR    ALT ACC    ALT DONOR
chr2:3A47790920  |    G    |      G    |    0.0    |    0.0    |     0.0   |     0.0   |
chr2:3A47790921  |    C    |      C    |    0.0    |    0.0    |     0.0   |     0.0   |
chr2:3A47790922  |    A    |      A    |    0.0    |    0.0    |     0.0   |     0.0   |
chr2:3A47790923  |    A    |      A    |    0.0    |    0.0    |     0.0   |     0.0   |
chr2:3A47790924  |    C    |      C    |    0.0    |    0.0    |     0.0   |     0.0   |
             +1  |    -    |      A    |    0.0    |    0.0    |     0.0   |     0.0   |
             +2  |    -    |      G    |    0.0    |    0.0    |     0.0   |     0.0   |
             +3  |    -    |      T    |    0.0    |    0.0    |     0.98  |     0.0   |
             +4  |    -    |      T    |    0.0    |    0.0    |     0.0   |     0.0   |
             +5  |    -    |      G    |    0.0    |    0.0    |     0.0   |     0.0   |
chr2:3A47790925  |    A    |      A    |    0.0    |    0.0    |     0.0   |     0.0   |
chr2:3A47790926  |    G    |      G    |    0.0    |    0.0    |     0.0   |     0.0   |
chr2:3A47790927  |    T    |      T    |    0.0    |    0.98   |     0.0   |     0.0   |
chr2:3A47790928  |    T    |      T    |    0.0    |    0.0    |     0.0   |     0.0   |
chr2:3A47790929  |    G    |      G    |    0.0    |    0.0    |     0.0   |     0.0   |
bw2 commented 6 months ago

There's now an automatic warning that appears for insertions that match both of these criteria: 1) the variant is an insertion of at least 2bp 2) its delta score is ≥ 0.2 and has a relative position of 0bp

image

(screenshot from NM_000179.3:c.261-3_261-2insAGTTG)

At some point, we will still want to add the visualization described in the previous comment.