HUPO-PSI / mzIdentML

Repository for mzIdentML and the corresponding examples
23 stars 24 forks source link

current correct way to encode the ion series searched for #128

Open colin-combe opened 2 years ago

colin-combe commented 2 years ago

I think it’s unclear what the current correct cv terms to be used for the ion series searched for (including losses) are. See pg 28 of the spec.

There are multiple deprecated / obselete terms, the ones that appear current aren’t members of the parent term that the specification allows in AdditionalSearchParams?

Try looking up “ion series considered in search” in OLS and navigating to something that is neither marked deprecated or obselete.

In other words, what is the current correct replacement for following:

<AdditionalSearchParams>
    <cvParam cvRef="PSI-MS" accession="MS:1001118" name="param: b ion"/>
    <cvParam cvRef="PSI-MS" accession="MS:1001262" name="param: y ion"/>
    <cvParam accession="MS:1001149" name="param: b ion-NH3" cvRef="PSI-MS" />
    <cvParam accession="MS:1001150" name="param: b ion-H2O" cvRef="PSI-MS" />
    <cvParam accession="MS:1001151" name="param: y ion-NH3" cvRef="PSI-MS" />
    <cvParam accession="MS:1001152" name="param: y ion-H2O" cvRef="PSI-MS" />
</AdditionalSearchParams>
javizca commented 1 year ago

At some point we tried to combine the ion series with the type of neutral loss. But I don't think this was properly implemented anywhere, or maybe I am wrong? In any case, the problem with this is that the size of the files increases dramatically so this is why this feature for ion annotations is used less and less I think

vrkosk commented 1 year ago

Mascot supports ion series with a NL like b* ("b ion-NH3"). The ion series config is a search-level parameter, not specific to whether a peptide has a variable mod. We don't plan to remove this support. If an alternative encoding is specified, that's fine, but please don't remove the obsoleted terms without providing a replacement.

colin-combe commented 1 year ago

In any case, the problem with this is that the size of the files increases dramatically so this is why this feature for ion annotations is used less and less I think

@javizca - are you possibly mixing this up with the IonType elements in SpectrumIdentificationItem/Fragmentation? As @vrkosk says these are search-level parameters (in AdditionalSearchParameters).

I think the main specification document and / or the CV terms need updated so its clear what the current, correct way to do this is?

mobiusklein commented 1 year ago

We've got three parts that are hard to specifically address because they are named very similarly and refer to the same concept but were created at different times with seemingly different goals.

  1. MS:1001066 ions series considered in search (is-a MS:1001249 ! search input details)
  2. MS:1002473 ion series considered in search (is-a MS:1001249 ! search input details)
  3. MS:1002307 fragmentation ion type (is-a MS:1001221 ! product ion attribute)

MS:1001066 is the root of a term sub-tree which enumerates ~9 ion series names with neutral losses that are marked deprecated (param: b ion-H2O DEPRECATED).

MS:1002473 is the root of a term sub-tree which enumerates ~18 ion series names or neutral losses (param: b ion). Those that refer to neutral losses are marked obsolete.

MS:1002307 is the root of a term sub-tree which enumerates ~35 ion series + neutral losses (frag: b ion - H2O).

The mzIdentML schema explicitly says you may supply a term derived from MS:1001066 one or more times in the AdditionalSearchParams element (red dot), but also supplies a term derived from MS:1002473 (blue dot). This is actually reflected in the mapping file in the validator as legal (https://github.com/HUPO-PSI/mzidentml-validator/blob/main/src/main/resources/mzIdentML-mapping_1.2.0.xml#L142): image

The mzIdentML schema explicitly stays you may supply a term derived from the parent of MS:1002307 in the IonType element. The IonType element is a component of the Fragmentation element which is what makes the file size explode and is not widely used.

mzIdentML never explicitly refers to the MS:1002473 sub-tree, or to its parent (search input details) which is a root term in the CV). That it is legal to use is only inferred from the mapping file in the validator.

@colin-combe wants to express that a search used a specific ion series with a neutral gain/loss without reporting the actual fragment ions matched. Since it's a neutral loss on an ion series, at first glance you may want to declare that a new ion series + neutral loss term derived from ions series considered in search, which is deprecated. What's recommended instead is to add a fragment neutral loss term to the Modification element in question, however that's only allowed to be used once per modification, so it might be necessary to either relax that one-per-modification restriction, or create a new term instead, or perhaps roll the neutral loss information into the CV providing the modification instead.

Relaxing the term repetition restriction means all the information is explicitly stated, but it could lead to huge files if there are many losses to enumerate. Rolling the losses into the CV means you can't express whether your tool used those losses or did something different. Finally, creating a new term means deciding whether you are stating "I looked for some neutral losses" in the search parameters or enumerating for each modified peptide which losses you searched for.

I think rolling this into the modification CV is a compromise that everyone can agree makes maintainers happy as it breaks nothing but puts the loss rules somewhere a motivated individual might look (albeit, each CV would have its own way of expressing this). That said, it doesn't let you express if your search engine did something other than follow those rules. Creating a new CV term to say you searched for neutral losses without naming them is of token value. You could say that XML compresses well enough that it doesn't matter if we increase file sizes by a MB for a large file, but that's a matter of scale.

colin-combe commented 1 year ago

What's recommended instead is to add a fragment neutral loss term to the Modification element in question

this is about the case where the loss is independent of a specific modification