HUPO-PSI / mzPAF

mzPAF Peak Annotation Format
Creative Commons Zero v1.0 Universal
2 stars 2 forks source link

Add chemical formulas for all reference molecules #12

Closed douweschulte closed 2 months ago

douweschulte commented 3 months ago

Two questions left:

douweschulte commented 3 months ago
I updated all formulas to be neutral masses. The alternatives isotopes as you called them are listed by Unimod as seen in the table below. I picked the one that is more generally representative. For iTRAQ uses this is close enough as it has 1 Da gaps between the channels. Accession # PSI-MS Name Interim name Description Monoisotopic masss Average mass Composition
731 iTRAQ8plex:13C(6)15N(2) Accurate mass for 115, 118, 119 & 121 304.199040 304.3081 H(24) C(8) 13C(6) N(2) 15N(2) O(3)
730 iTRAQ8plex Representative mass and accurate mass for 113, 114, 116 & 117 304.205360 304.3074 H(24) C(7) 13C(7) N(3) 15N O(3)
edeutsch commented 3 months ago

This looks great, thanks. My remaining question is about the notation. Your notation for 2 carbon 13s is [13C2] Unimod's notation for the same is 13C(2) One can imagine other notations such as (13C2) or (13C)2 or [13C]2 or others

It would be ideal to all agree on an optimal notation that could work for these files, in-line mzPAF and in-line ProForma, which all potentially have need of this.

douweschulte commented 3 months ago

Sorry I misread your question then. I followed the ProForma standard on how to write isotopes. There is no support for this yet in mzPAF, but I would propose to use the exact same to try and prevent adding another standard. One downside is that this formula standard can start with '[' and so makes parsing neutral losses and differentiating between a formula and a named compound harder.

On the various ways of notating, I have had to built parsers for 5 different variants just to fully support pro forma as every modification database does it differently (and I did not come around to implementing Resid yet).

mobiusklein commented 3 months ago

ProForma specifies chemical formula with isotopic labeling in Formula Rule 3 in Section 4.2.7 of the specification. It would simple enough to support the isotopic notation in mzPAF's f{...} series, but it will require more work post-regex since it supports nesting. Supporting them in the neutral loss formulae is trickier because if a formula starts with a stable isotope, the pattern cannot distinguish between a formula or a named reference molecule.

I think we skipped implementing stable isotopes in formulae in mzPAF because we didn't want to add a conflicting mechanism for the i isotope component.

douweschulte commented 2 months ago

Discussed in the weekly PSI-MS meeting. The specification will be updated to allow stable isotope annotations as in ProForma. For now the slight inaccuracy of iTRAQ4/8plex is ignored, if this ever needs fixing we would advise to add another term to the reference list using the same naming as Unimod (see UNIMOD:730 & UNIMOD:731).