broadinstitute / SpliceAI-lookup

Website for checking SpliceAI and Pangolin scores:
https://spliceailookup.broadinstitute.org
MIT License
18 stars 7 forks source link

Interesting case that produces a disagreement between SpliceAI and ApliceAI lookup #42

Closed Manuel-DominguezCBG closed 1 year ago

Manuel-DominguezCBG commented 1 year ago

This variant NM_001042492.3(NF1):c.7189+1dup shows different results in both tools because the variant description is interpreted differently.

Description of the case first:

In my local pipeline version (spliceAI) the variant is this

17 | 29670152 | A | AG | AG | NF1--NM_001042492 | 0 | 0 | 0 | 0.13

This is correct

Untitled2

If I search this variant ( NM_001042492.3(NF1):c.7189+1dup) in SpliceAI lookup the duplication is two positions downstream 17-29670154-G-GG and the result is 0.00, 1, 0.00, 086

I believe the reason that may explain the difference of DS is clear if we see the impact of that extra G in the WT region in the position 29670152 and position 29670154.

screenshot (2)

I have used VariantValidator to check which tool is misunderstanding the variant description (by SpliceAI lookup) and the VCF Description (by SpliceAI because the input of this tool are variants in a VCF).

So if I put NM_001042492.3(NF1):c.7189+1dup in VV this is interpreted as followed screenshot (3)

The SpliceAI I have in my pipeline takes a customed Gene annotation file and I have checked that for NF1 the start and end of that exon (number 47 in that transcripts) is correct.

Is there a possible explanation that could explain this disagreement??

I don't know if this is issue fully related to SpliceAI lookup or SpliceAI but the thing is that it is relevant for me to know what is the reason that is causing this because I work in a clinical genomics lab and part of my team is using SpliceAI and another part of the team SpliceAI lookup and I would like to have the same results in both tools. I did a validation involving 100+ variants and I did not see any disagreement and I have seen a guy in one of your issues that did the same analysis involving 1000+ variants without significant changes.

I hope to find an answer or at least to show this interesting case.

Thanks

Manuel

The link of the variant searched in SpliceAI lookup is: https://spliceailookup.broadinstitute.org/#variant=NM_001042492.3(NF1)%3Ac.7189%2B1dup&hg=37&distance=50&mask=1

Manuel-DominguezCBG commented 1 year ago

I have investigated this a bit more and I have found what is the reason of the problem. Essencially, we have an additional G mutation in a 2 Gs sequence

Wt AAAAGGTAAA

If the variant is described using the :c. nomenclature (NM_001042492.3(NF1):c.7189+1dup) G is added at the end of the GG 54 AAAAGGGTAAA

However, if the variant is described using the genomics nomenclature :g. the G is added at the beginning of the GG

52 AAAAGGGTAAA

This may sound irrelevant but it is not irrelevant for SpliceAI. The first case returns 100% affecting splice-site seq. the second return 13%.

We have seen this difference because we use both tools (spliceAI and SpliceAI lookup) and my team reported this difference.

I hope this makes sense now. It is not a SpliceAI or SpliceAI lookup issue but I though it is worth it to share this with you