eclipse / chemclipse

ChemClipse Project
Eclipse Public License 1.0
39 stars 18 forks source link

IMassSpectrumNormalizable / IMassSpectrumCloneable are not defined consitently and are used in problematic ways. #148

Closed laeubi closed 2 years ago

laeubi commented 4 years ago

IMassSpectrumNormalizable is declared by scans and used all over the code in a problematic way:

We should change the code in the following way:

All calling code of makeDeepCopy and normalize must be checked and changed to conform to this new approach, makeDeepCopy could be replaced in most cases by a call to normalize, calls to normalize without taking the return value into account must be examined if they really want to modify the underlying data (e.g. mas spectrum filters) and then be replaced by appropriate calls.

eselmeister commented 4 years ago

That's a good idea. Some database import/export formats need normalized spectra (factor 1000), e.g. .msl or .msp. The normalization is also used (user setting) when running the mass spectrum subtract filter. The normalize() and deepCopy() functionality should be definitively refactored. The only thing we have to take care of, is to check that mass spectra are not modified unintentionally, e.g.:

One has a look at the scan of chromatogram. If an operation normalizes a few of the scans of a chromatogram without creating a deep copy, for example to create a sum mass spectrum over an retention time range, the original chromatogram scan data is modified and the chromatogram looks odd afterwards.

The initial idea of OpenChrom was also to store each ion (m/z - abundance) as an object. This works well for small chromatograms, but gets to its limits when trying to analyze e.g. GCxGC/TOF files with file sizes of > 1 GB. An array of m/z and abundance values could have been the better choice probably. But that's another story.

Am 03.12.19 um 07:57 schrieb Christoph Läubrich:

IMassSpectrumNormalizable is declared by scans and used all over the code in a problematic way:

  • IMassSpectrumNormalizable declares methods to normalize a spectrum and returns a scan, but the returned scan is ignored in almost all cases, most (if not all) callers of these methods assume that the scan itself is modified
  • to prevent this destructive operation another interface exits IMassSpectrumCloneable that allows to make a "deepcopy" but this is not part of IMassSpectrumNormalizable itself. But this deepCopy operation is not very well defined, it just says it makes a deep-copy of the spectrum, but does not mandates that an object of the same type is returned
  • even though there is a method "isnormalized" it is never called but spectras are instead cloned and normalized again and again all over the place even though they might be already normalized because of the previous described properties
  • it seems that normalize() is also nearly never called, but instead always with a specified factor, some use 100 some use 1000 as the normalization factor
  • This does not only hurt performance in several applications like mass-spectra comparisons, but also creates a high pressure on the GC as objects are created and thrown away afterwards

We should change the code in the following way:

  • throw away IMassSpectrumCloneable, only the caller could know what exact properties need to be copied (ions, library information, whatever, ...) and we better could provide static helper methods for example to copy the ions into a new spectrum.
  • IMassSpectrumNormalizable should get a generic paramter that specify the return value of normalize
  • IMassSpectrumNormalizable#normalize() method could be thrown away as it seems there is no real default for the base, we might define "common" values as constants in the interface (here 100 + 1000)
  • IMassSpectrumNormalizable#normalize(float) should either return the instance itself if it is already normalized with the given factor or a (probably cached) new instance of the class defined as the generic with the given factor and never modify the scan itself.
  • IMassSpectrumNormalizable#getNormalizationBase and isNormalized could be removed from the interface

All calling code of makeDeepCopy and normalize must be checked and changed to conform to this new approach, makeDeepCopy could be replaced in most cases by a call to normalize, calls to normalize without taking the return value into account must be examined if they really want to modify the underlying data (e.g. mas spectrum filters) and then be replaced by appropriate calls.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eclipse/chemclipse/issues/148?email_source=notifications&email_token=AAFHUT6O47C2USGAUZUIH6DQWX7NJA5CNFSM4JUS75U2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H5SENNA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFHUTZGIZWIOCD2FYRSRUTQWX7NJANCNFSM4JUS75UQ.

--

OpenChrom - the open source alternative for chromatography / mass spectrometry
Dr. Philip Wenig » Founder » philip.wenig@openchrom.net » http://www.openchrom.net