Open sneumann opened 1 year ago
It would be great ... but it is not always represented accurately iirc, hence the manual input option is still desirable to avoid errors / to maintain accuracy. Originally it was not available at all, but now some information is available in some cases in mzML I think. While I am not sure what the current status is wrt CE and mzML, last time I checked ramping was still displayed incorrectly (but also represented in a misleading manner in the raw files) for Thermo, for instance.
Perhaps someone could look into this to see how far things can be automated (for which vendor / acquisition types), and which cases should be overruled manually? Not sure if @meowcat has a suitable range of files available to do this, or someone in Halle? I don't offhand (sorry).
Probably something like the following should work:
spectraListMode
(better name?) parameter with options
auto
: no CE list required, just automatically use value from mzML. map
: match the CE from mzML with an entry from spectraList
, and fail if there is no match. This will be the new default, it is safer than the existing option because it makes sure nothing is accidentally mismatchedmanual
: ignore all data from mzML and assume CEs are in the listed order. This is the fallback for cases where there is no useful info in the mzMLspectraList
entries there is an optional map
parameter which tells us what apparent CE this spectrum will have in the raw data. (specify tolerance?). Example
- mode: HCD
ces: 10%, 30%, 60% (stepped)
ce: 10%, 30%, 60% (stepped)
res: 7500
map: 33.3 # this tells us that the spectra for this CE settings have value 33.3 in the mzML
spectraListMode: auto
, use the condition_hash
in the ACCESSION
by default; this is a 4-character hash derived from INSTRUMENT_TYPE, POLARITY, COLLISION_ENERGY
etc.: https://github.com/MassBank/RMassBank/blob/73f172e051c56a464ee7fc25fc81853638d984b1/R/buildRecord.R#L464-L469ACCESSION
generation unchanged for backward "compatibility"? This can be discussed.Only few cases wouldn't work with map
: specifically you can construct cases where different stepped-CE settings give the same average and so are indistinguishable from the mzML. I don't think it would be much of a problem in practice.
What I consider a more annoying issue is how to represent CEs in a machine-readable way. Do the CVs have provisions for stepped, ramped etc cases? How can we include those?
Hi, The collision energy information is used in two places in a MassBank record, the
AC$MASS_SPECTROMETRY: COLLISION_ENERGY
and (optionally) in the Title. Hence, the INI file defines thece
long form for the former, and aces
short form that can be used in the title generator. This requires that all CEs are the same across all the input files.I think it would be great to use the collision energy information from the mzML or MSP input files. @achimmiri has some examples in MSP files.
A question is: should we 1) use the CE from spectral data to override the information in the INI files. Or 2) should we use the CE from the spectral data by default, and only fall back to the info in the INI file if it is missing in the spectral data ? IIRC the original reason for the CE (and resolution) info in the INI was because UFZ had quite nice and fixed instrument methods cycling through a few combinations, and resolution is certainly not included in the mzML.
Thoughts ? The least invasive approach would be to get CE parsing into the
readMSP()
and someif/else
into the record creation if that CE information is present.Yours, Steffen