HUPO-PSI / psi-ms-CV

HUPO-PSI mass spectrometry CV
Other
26 stars 34 forks source link

contaminant spectra percentage #209

Closed mwalzer closed 9 months ago

mwalzer commented 1 year ago

What is the QC term's name?

contaminant spectra percentage

Briefly describe the QC term.

The percentage of spectra identified with a contaminant database after user defined acceptance criteria are applied. The type of criterion applied should be noted in the metadata or analysis methods section of the recording file for the respective run or set.

What is the QC term's unit?

UO:0000187

Value type

MS:4000003 ! single value

Describe any additional information.

Like so, I imagine:

[Term]
id: MS:4000177
name: contaminant spectra percentage
def: "The percentage of spectra identified with a contaminant database after user defined acceptance criteria are applied. The type of criterion applied should be noted in the metadata or analysis methods section of the recording file for the respective run or set." [PSI:MS]
is_a: MS:4000003 ! single value
relationship: has_metric_category MS:4000008 ! ID based metric
relationship: has_value_type xsd:float ! The allowed value-type for this CV term
relationship: has_value_concept UO:0000187 ! percent
mwalzer commented 1 year ago

Should work for spectral searches, too. @cbielow @bittremieux Do we need a separate metric for when combined target and contaminant search is conducted?

bittremieux commented 1 year ago

Ah, I would assume that you'd always combine your contaminants with your regular sequences, instead of only searching the contaminants. Otherwise you might get incorrect matches to the contaminants because the correct proteins/peptides are not in the search space.

cbielow commented 1 year ago

Ah, I would assume that you'd always combine your contaminants with your regular sequences, instead of only searching the contaminants. Otherwise you might get incorrect matches to the contaminants because the correct proteins/peptides are not in the search space.

I second that :)

Also, I'm not sure how to interpret the wording The percentage of spectra identified with a contaminant database Do you want the fraction of all identified peptides which are assigned to a contaminant DB (this is what I would want) - or do you want the fraction of all spectra (identified or not) which were identified as belonging to a contaminant db? (not sure what this would tell me). Maybe make the wording less ambiguous (or is it just me?) :)

mwalzer commented 1 year ago

I think this addresses both points made.

[Term]
id: MS:4000177
name: contaminant spectra percentage
def: "The percentage of identified spectra labelled contaminants after identification with a sequence database that included annotated contaminant sequences, after user defined acceptance criteria are applied. The type of criterion applied should be noted in the metadata or analysis methods section of the recording file for the respective run or set." [PSI:MS]
is_a: MS:4000003 ! single value
relationship: has_metric_category MS:4000008 ! ID based metric
relationship: has_value_type xsd:float ! The allowed value-type for this CV term
relationship: has_value_concept UO:0000187 ! percent

But IDK if a percentage of identified spectra would be as informative as it could be. I suppose you'd always want to know what the percentage/number of identified spectra was. Recommend to use in combination with percentage of identified spectra in all spectra searched (no such term yet) ?

bittremieux commented 1 year ago

We have "number of MS2 spectra" and "count of identified spectra" terms. So instead of reporting the percentage, if this term would be the absolute number as well, you can calculate the ratio of contaminant identifications in terms of all spectra or all identified spectra from those three terms.

I would also make the definition a bit more general instead of "identified with a sequence database", so that it's also valid for spectral library searching or de novo (could be interesting for very unexpected contaminants).

edeutsch commented 9 months ago

@mwalzer still want to address these?

bittremieux commented 9 months ago

We don't need this term at the moment anymore, so we'll close the issue for now and follow up in the future when it becomes necessary.