Equation used to compute relative affinity for new sequences using curated MotifCentral models

vitkl commented 1 year ago

Hi

Congratulations on impressive and hugely useful work - both the ProBound model and MotifCentral database!

I am trying to understand how exactly the relative affinities for new sequences are computed using curated MotifCentral models - e.i. what bindingModeScores computes in this line:

proBoundTools -c 'loadMotifCentralModel(15412).addNScoring().inputTXT(seq.txt).bindingModeScores(/dev/stdout)'

I struggle to understand which equation is used to compute relative affinity as a functions of A) PSAM (presumably stored in MotifCentral.v1.0.0.json) w_{motif length, 4 nucleotides}, and B) new one-hot encoded sequence s_{total length, 4 nucleotides}. Specifically, what is the function/equation that's used to compute one relative affinity for one offset?

affinity =  function(w_{motif length, 4 nucleotides}`, `s_{motif length, 4 nucleotides}`)

I see that this computation is done in slidePN and that it is related to Eq 5 in the paper methods section:

However, I don't understand these 2 terms below are related to PSAM and the new sequence - is beta_a = PSAM and X(S) = the new sequence?

Could you please explain this in a bit more detail, ideally writing pseudocode for affinity = function(w_{motif length, 4 nucleotides}, s_{motif length, 4 nucleotides})`?

laijen000 commented 1 year ago

Hello, I have a related question, where I'm wondering if the affinity scores are such that higher values = higher affinity?

vitkl commented 1 year ago

Another related question - what causes learned TF-DNA preference weights to be negative and have a maximum value of 0 per position (for the nucleotide with most importance)? I don't fully understand why this constraint follows from equations and how it's implemented in code.

BussemakerLab / ProBoundTools

Equation used to compute relative affinity for new sequences using curated MotifCentral models #2