HUPO-PSI / mzTab

mzTab Reporting MS-based Proteomics and Metabolomics Results
https://hupo-psi.github.io/mzTab
39 stars 17 forks source link

Update adducts regex #211

Open Adafede opened 4 months ago

Adafede commented 4 months ago

Dear all,

I think the current regex does not allow to parse some complex types of adducts, being mixes of isotopomers and adducts.

Part of what I would like to address is described in the last lines of https://skyline.ms/wiki/home/software/Skyline/page.view?name=adduct_descriptions ("Charge-Only Examples with Isotope Labels Described Only by Mass")

and it would be great to account for things like [13C3]C3H12O6 at the same time.

I think many of us got some slight different flavors of regex internally, so eventually let's discuss about the perfect only one? 😄

I will start with an attempt: \\[(\\d*)M(?![a-z])(\\d*)([+-][\\w\\d].*)?.*\\](\\d*)([+-])?

@nilshoffmann

nilshoffmann commented 4 months ago
Adafede commented 4 months ago

@nilshoffmann If you point me to where you expect the tests to be, I am happy to help writing them (or their first version)

nilshoffmann commented 3 months ago

@Adafede The Regexp is part of the constants in the JAVA parser/writer implementation: https://github.com/lifs-tools/jmzTab-m/blob/master/api/src/main/java/uk/ac/ebi/pride/jmztab2/model/MZTabConstants.java#L75C1-L75C87

The corresponding test is here: https://github.com/lifs-tools/jmzTab-m/blob/master/io/src/test/java/org/lifstools/mztab2/ParsingPrimitivesTest.java#L41

Happy to take any pull requests. Please note that JAVA Regexps need additional escaping to work, e.g. \ instead of \, but the code for the pattern in MZTabConstants should contain the original regexp along with the JAVA version.