HUPO-PSI / psi-ms-CV

HUPO-PSI mass spectrometry CV
Other
26 stars 36 forks source link

Why is "Comet:expectation value" a "search engine specific score", but all other (Comet-)scores "PSM-level search engine specific statistic" #291

Closed julianu closed 1 month ago

julianu commented 2 months ago

Describe the question or discussion

I just came across a problem in my parser, as basically all PSM scores from search engines are defines as "is_a: MS:1001143 ! PSM-level search engine specific statistic", which in my eyes looks ok. Only the "Comet:expectation value" (MS:1002257) and also its Sequest equivalent and some other scores are "is_a: MS:1001153 ! search engine specific score".

This discrepancy looks a bit weird for me, and especially for the "Comet:expectation value" seems to be wrong. Also, this makes parsing the results harder (but not impossible).

Just wanted to bring this up to maybe be corrected? Some scores actually also have both of the is_a relations.

mobiusklein commented 2 months ago

PSM-level search engine specific statistic and search engine specific score are parts of different hierarchies with slightly different meanings. Based upon when these terms are referenced, search engine specific score is from mzIdentML 1.0, while PSM-level search engine specific statistic was added with mzIdentML 1.2. I agree it is confusing, but here's my digest of the terms and a question to help pick a solution:

[Term]
id: MS:1001143
name: PSM-level search engine specific statistic
def: "Search engine specific peptide spectrum match scores." [PSI:PI]
is_a: MS:1002347 ! PSM-level identification statistic

[Term]
id: MS:1002347
name: PSM-level identification statistic
def: "Identification confidence metric for a peptide spectrum match." [PSI:PI]
is_a: MS:1002345 ! PSM-level attribute

[Term]
id: MS:1002345
name: PSM-level attribute
def: "Attribute of a single peptide-spectrum match." [PSI:PI]
is_a: MS:1002694 ! single identification result attribute
relationship: part_of MS:1003301 ! peptide-spectrum match

This seems to cover the multitude of different metrics that some search engines produce, e.g. the SEQUEST family of a dozen features that are then fed through a post-processor like PeptideProphet or Percolator to produce a single confidence value.

[Term]
id: MS:1001153
name: search engine specific score
def: "Search engine specific scores." [PSI:PI]
is_a: MS:1002694 ! single identification result attribute

This seems to refer to the raw score that results might be sorted by, but its exact semantics aren't well specified unless the term also has MS:1002108|higher score better or the like added.

Is there a need for single parent term to capture both of these ideas that is more specific than single identification result attribute, or is it a semantics issue of what to do with one vs. the other?

julianu commented 2 months ago

What makes it a bit confusing in my opinion is, that only SEQUEST's and Comet's expectation values are Search engine specific scores, while for example X!Tandem's and Mascot's expectation values are PSM-level search engine specific statistics.

mobiusklein commented 2 months ago

That's probably a combination of term age and oversight when mzIdentML 1.2 was created, followed by maintainers not knowing enough about the authors' intent/ambiguity at the tool level.

If you have a set of terms where you think one or both of the classes is applied incorrectly (absent but appropriate or inappropriate but present), I'd be happy to review it and make the necessary changes to the CV.

edeutsch commented 1 month ago

addressed by #312