ambj / MuPeXI

MuPeXI: the mutant peptide extractor and informer, a tool for predicting neo-epitopes from tumor sequencing data.
Other
43 stars 27 forks source link

let us talk about the Monotonicity of the function calculating priority_score #29

Open kobejamescurry opened 5 years ago

kobejamescurry commented 5 years ago

can you give some advice, thanks a lot

ibwoo commented 5 years ago

@kobejamescurry my understanding may not be correct, but I see two immediate points of confusion that I think I can clarify.

Firstly, rank % Rm and Rn are put through the logistic function you pasted above, so the lower rank % actually returns a score closer to 1 on the 0-1 scale. Therefore a "higher affinity" (I'm not sure if this is still the correct term now that NetMHCpan v4.0 default output uses EL output by default) value does positively effect the final priority score.

If I understand your point about the Normal exact match penalty then I believe you have it backwards as well - if an exact match to the mutant peptide exists elsewhere in the proteome then this suggests that T-cell tolerance mechanisms may have removed any TCRs that would recognise the mutant (and matching normal) peptide. Therefore, this peptide should be discarded (which the equation does by multiplying by 0).

I hope that clears up some of your points there, I'm sure @ambj can provide a much clearer answer.

ibwoo commented 5 years ago

@kobejamescurry I think you're very close, unfortunately you've just got that Rn part backwards.

Have a closer look at the algorithm, perhaps try a little mock example in R or Python and see what changing the inputs does to the output priority score. I can tell you with 100% certainty that a higher L(Rn) value results in a lower priority score based on the above (implemented) MuPeXI algorithm. At the same time, a lower L(Rm) results in a lower priority score.

Though it seems you understand the overall concept, if you haven't heard of it then perhaps you could investigate the phrase "differential agretopicity index" (DAI). I'm not sure if this recent paper is the best reference, but a quick google search turned it up and they seem to delve into the nature of this DAI. This paper here was the first I saw DAI mentioned in. It's an important concept to understand when considering current neoantigen prediction strategies.

I hope this helps.

(Edited to fix mistake - replace Rn and Rm with L(Rn) and L(Rm), respectively)

ibwoo commented 5 years ago

@kobejamescurry, I apologise I did make a mistake in my previous reply, and have edited the message to reflect that.

Having re-read your messages, I'm not sure how else I can help. The reason I mentioned the DAI was to point out the reasoning behind selecting a GOOD-binding (low rank %) mutant peptide and a NON-binding (high rank %) normal/wild-type peptide as a neoantigen. This rationale is correctly reflected in the MuPeXI priority score calculation.

RNdenominator, a higher Rn value results in a lower priority(there is a minus sign)

Your above line is incorrect as the higher Rn value (higher rank %, translates to low L(Rn) value) would result in a higher priority score. You seem to understand the reason why this is the desired result, but you can't see how the equation reflects this. Have you tried implementing the code anywhere such as R or Python (or maybe Excel?). I suggest that you try this and play around with the values.

I appreciate your offer to Skype but I'd prefer to keep the discussion public to help anybody else with the same issue, and so that @ambj can respond and confirm or dispute my explanation.