crisprVerse / crisprScore

On-Target and Off-Target Scoring Algorithms for CRISPR gRNAs
MIT License
13 stars 3 forks source link

MIT score calculation #7

Closed eldrid01 closed 1 year ago

eldrid01 commented 1 year ago

I've been looking at the MIT score calculation and wondering if it is correct or how to square it with the same calculation in CRISPOR (https://github.com/maximilianh/crisporWebsite).

I'm new to this whole area of CRISPR guide scoring but my understanding is that the higher the individual score for an off-target match, the more significant it is in reducing the efficiency of the guide. When combining several individual scores, the MIT score is calculated as 100 / (100 + sum(scores)) and sometimes reported as an integer by multiplying by 100 and rounding. So higher individual off-target scores lead to a lower overall MIT score.

The getMITScores function gives scores of zero for single mismatch protospacers where the mismatch is at positions 13, 16, 17, 19 and 20, because the tolerance weights at each of these positions is zero (mit.weights vector). For example, a single mismatch at that last position:

> getMITScores("ACACCGCTCCCATAAAGCCA", "ACACCGCTCCCATAAAGCCG", "TGG")
                spacer          protospacer score
1 ACACCGCTCCCATAAAGCCA ACACCGCTCCCATAAAGCCG     0

This doesn't seem right to me. I had a look at the Python source code for the CRISPOR website and found that the calculation for individual off-target match scores (https://github.com/maximilianh/crisporWebsite/blob/master/crispor.py#L1999) differs in two ways:

  1. The position (in)tolerance weights are the same values but in reverse order, i.e. position 20 has a weight of 0.585 in CRISPOR but this is the weight for position 1 in crisprScore, similarly position 19 has a weight of 0.685 in CRISPOR but this is the weight at position 2 in crisprScore, etc. The weights are given in the same order as CRISPOR in inst/mit/mit.weights.txt but inst/mit/processWeights.R reverses the order.

  2. These weights are subtracted from 1 in CRISPOR prior to being multiplied together for each of the mismatches, i.e. these weights appear to be measures of intolerance for a mistmatch at each position rather than a measure of tolerance as indicated in inst/mit/processWeights.R.

Is the implementation in crisprScore correct?

Jfortin1 commented 1 year ago

Hi @eldrid01, you're correct about everything, thanks for looking into this and for letting us know. It is now fixed in version 1.3.2