Closed bw2 closed 3 months ago
There's now an automatic warning that appears for insertions that match both of these criteria: 1) the variant is an insertion of at least 2bp 2) its delta score is ≥ 0.2 and has a relative position of 0bp
(screenshot from NM_000179.3:c.261-3_261-2insAGTTG)
At some point, we will still want to add the visualization described in the previous comment.
Issue #67 highlighted why SpliceAI and Pangolin scores should be treated with caution for any insertion variant where the inserted bases are at least partially the same as adjacent reference sequence - such as the chr2:47790924-C-CAGTTG insertion illustrated below:
Even though SpliceAI and Pangolin delta scores imply that this insertion creates a new splice acceptor 3bp ustream of the annotated exon junction, this is just a technical artifact of converting the model's predictions into delta scores. The underlying models in fact predict no difference in splicing between the REF and ALT haplotypes, and so the variant just inserts 5bp of additional sequence into the exon.
To mitigate this issue, we should 1) show a warning in the results table for insertions that have a high delta score and where the inserted bases match adjacent reference sequence 2) allow users to see the base-level predictions for inserted bases. The IGV.js visualization currently only shows predicted REF and ALT scores for positions in the reference genome. It's not possible to modify it to also show scores for individual inserted bases. Instead, show a new table with one row per position like below, or create a new static visualization.