MassBank / RMassBank

Playground for experiments on the official http://bioconductor.org/packages/devel/bioc/html/RMassBank.html
Other
12 stars 15 forks source link

Allow the Collision Energy from spectral Data to override CE from the INI file. #316

Open sneumann opened 1 year ago

sneumann commented 1 year ago

Hi, The collision energy information is used in two places in a MassBank record, the AC$MASS_SPECTROMETRY: COLLISION_ENERGY and (optionally) in the Title. Hence, the INI file defines the ce long form for the former, and a ces short form that can be used in the title generator. This requires that all CEs are the same across all the input files.

I think it would be great to use the collision energy information from the mzML or MSP input files. @achimmiri has some examples in MSP files.

A question is: should we 1) use the CE from spectral data to override the information in the INI files. Or 2) should we use the CE from the spectral data by default, and only fall back to the info in the INI file if it is missing in the spectral data ? IIRC the original reason for the CE (and resolution) info in the INI was because UFZ had quite nice and fixed instrument methods cycling through a few combinations, and resolution is certainly not included in the mzML.

Thoughts ? The least invasive approach would be to get CE parsing into the readMSP() and some if/else into the record creation if that CE information is present.

Yours, Steffen

schymane commented 1 year ago

It would be great ... but it is not always represented accurately iirc, hence the manual input option is still desirable to avoid errors / to maintain accuracy. Originally it was not available at all, but now some information is available in some cases in mzML I think. While I am not sure what the current status is wrt CE and mzML, last time I checked ramping was still displayed incorrectly (but also represented in a misleading manner in the raw files) for Thermo, for instance.

Perhaps someone could look into this to see how far things can be automated (for which vendor / acquisition types), and which cases should be overruled manually? Not sure if @meowcat has a suitable range of files available to do this, or someone in Halle? I don't offhand (sorry).

meowcat commented 1 year ago

Probably something like the following should work:

Only few cases wouldn't work with map: specifically you can construct cases where different stepped-CE settings give the same average and so are indistinguishable from the mzML. I don't think it would be much of a problem in practice.

What I consider a more annoying issue is how to represent CEs in a machine-readable way. Do the CVs have provisions for stepped, ramped etc cases? How can we include those?