Rappsilber-Laboratory / build-xiview

GNU General Public License v3.0
2 stars 0 forks source link

Validation page Matches not sorted by score #33

Closed lutzfischer closed 4 years ago

lutzfischer commented 4 years ago

the matches are not sorted by score

colin-combe commented 4 years ago

was this an mzIdentML file?

lutzfischer commented 4 years ago

yes

lutzfischer commented 4 years ago

https://xiview.org/xi3/validate.php?upload=1719-83803-68653-40358-15307

colin-combe commented 4 years ago

yes

yeah... I was waiting for this...

lutzfischer commented 4 years ago

then the wait is over ...

colin-combe commented 4 years ago

as you likely know there can be several different scores in mzIdentML

lutzfischer commented 4 years ago

I see your point. So question is what do we do?

As a first heuristic one could say that if there is only one score present . Use that one. If it matches ((.*:\s*)?[pq][\-\s]?(value|score)|.*FDR.*|.*PEP.*|.*probability.*) assume it is lower is better - otherwise higher is better.

I was thinking that if several are given one could use an maybe present <Threshold> as a guidance. But that seems not to work. The name of the threshold is somewhat decoupled from the actual score.

colin-combe commented 4 years ago

actually, there was already a simple solution in it along the lines of what you suggest - assume there's only one

(the multiple score info does get stored in db and does then get passed to interface, at last minute in interface code a decision is made which to use)

if theres multiple scores is uses the first it encountered, but should be consistent in always using same one (e.g. in case one match is missing particular score you won't get another different score)

i think there's some other bug causing things not to order.

lutzfischer commented 4 years ago

updated my last comment a bit - did not notice that some text got lost (e.g. stars)

lutzfischer commented 4 years ago

actually just looked into the obo. Some scores come with more info: we could only consider there children of MS:1002347 - PSM-level identification statistic or MS:1001143 - PSM-level search engine specific statistic And prioritise scores that come with with an has_order term (can be either reference MS:1002108 - higher score better or `MS:1002109 - lower score better). But that would require an psiMS.obo parser. Not sure how complicated that would be.