Open ypriverol opened 6 months ago
LFQ PXD001819 and TMT PXD007683 were tested using different num_hits
values (1, 2 and 3).
LFQ results: When num_hits increased, the number of PSMs reported by search engines would increase. But distribution of search engines scores has no obvious change. Target PSMs and decoy PSMs are both significantly increased from Comet and MSGF. But the increasing part are most worse PEP scores. So the final results are not improved when increasing num_hits. Even performance dropped a litte.
TMT results: showed consistent results with the LFQ.
If you are using multiple hits, you probably want some more sophisticated consensus scoring. E.g. PEPMatrix that takes into account the similarities of the top_hits across SEs and allows some kind of reweighting based on the number of times a sequence "scaffold" was identified across multiple engines. No guarantees that it gets better though 😁
Could also be used during feature linking but we do not have an algorithm for that yet. So no short-term improvements possible there.
One thing that I am a bit surprised about is that it gets worse. If we are only taking the best PSM per spectrum, nothing should change by adding second-best hits. So maybe we are somewhere using more than just the best hit. If you upload a very small experiment, I can check it when I find time.
Description of feature
Would be good to test for multiple datasets the impact of the parameter
num_hits
. The idea would be seen how this parameter will affect the identification step and the quant results.