broadinstitute / SpliceAI-lookup

Website for checking SpliceAI and Pangolin scores:
https://spliceailookup.broadinstitute.org
MIT License
19 stars 7 forks source link

How Delta score is compute from the alt and Ref score? #58

Closed Manuel-DominguezCBG closed 4 months ago

Manuel-DominguezCBG commented 11 months ago

Hi,

My team is wondering how the Delta score of the Acceptor Loss is computed from the difference between the Ref score and the Alt score. Let's see the following examples.

For this variant: chr8-140300616-T-G image it seems that the maths is Ref score - Alt score

However this variant: NM_001089.3(ABCA3):c.875A>T (p.Glu292Val) image The same difference should be 0.08 but we got 0.00 instead.

More specifically, to give you the last example, my team is working with this variant: NM_006767:c.1354A>G image Why the Delta score is 0? It should be 0.5, isn't it?

Could you explain this?

bw2 commented 11 months ago

Hi @Manuel-DominguezCBG If you uncheck the "masked scores" checkbox and rerun your search, you should see the results you expect. The help message for that checkbox explains the behavior you are seeing.

Manuel-DominguezCBG commented 11 months ago

Thanks @bw2 for your help very useful as always :)

we do variant interpretation so we prefer masked scores (we also validated the tool with the masked scores on). I was not aware of the relationship between the masked scores and the alteration of this in the formula REF score - Alt Score. From what I can see, when the masked score is on, the computation (REF score - Alt Score) is not always strictly applied and some modifications are applied as seen in the variant NM_006767:c.1354A>G. Is this statement correct? If so, Could you explain what the masked score actually does?? I can understand the info provided by the interrogation mark: This parameter masks the delta score: It will remove or set to zero all losses of non-splice sites and all gains of splice sites but where is this info coming from. I mean by what criteria do these modifications apply?

For us, it is very important to understand (as much as possible) the tools we use. Sorry if this is not a SpliceAI-lookup question.

By the way, our local version is working very well since installed. Thank you so much!

bw2 commented 11 months ago

The way I understand it, for donor gain or acceptor gain predictions, when the predicted position (as shown in the "position" column) corresponds to the genomic location of a known donor or acceptor site (ie. the 2 nucleotides of a splice junction as defined by Gencode for the transcript on which the prediction is based), then masking will cause 0 to appear in the delta score column (instead of the REF score - Alt Score value). Also, for donor loss or acceptor loss predictions, the delta score will be masked only if the base pair position doesn't match a known splice junction donor or acceptor site.

In your example @ https://spliceailookup.broadinstitute.org/#variant=NM_006767%3Ac.1354A%3EG&hg=38&distance=500&mask=1&ra=1

only the acceptor loss score is masked (ie. the delta score column shows 0 even though REF score - Alt Score is not 0)

image

If we look at the visualization, it always shows (REF score - Alt Score), regardless of whether masking is on or off. Here we can see the acceptor loss score as the yellow bar:

image

Since it's a few base pairs to the right of the known splice junction, it's considered an unannotated acceptor position, and so the weakening or loss of an unannotated acceptor is masked in the table.

However, looking at chr8-140300616-T-G, the masking behavior isn't what I expected, so there might be a bug and I need to troubleshoot this. There, I would expect both the donor loss and the acceptor loss scores to be masked, (or neither of them, since they're both 1bp away from the nearest annotated junction).

The issue may be the same as the one described in https://github.com/Illumina/SpliceAI/issues/27 but I have to double check.

Anyway, thank you for bringing this up.

Manuel-DominguezCBG commented 11 months ago

Thank you so much for that detailed explanation. Brilliant and glad this is helping you to identify a bug.