eugenemel / maven

Maven GUI: Metabolomics Analysis and Visualization Engine
https://github.com/eugenemel/maven/releases
GNU General Public License v3.0
19 stars 9 forks source link

Enhance FragmentationMatchScore with additional metrics #547

Open PMSeitzer opened 2 years ago

PMSeitzer commented 2 years ago

Including (but not limited to), modified cosine score, neutral loss match score, and the updated cosine score.

This can be used very generally, and updated in the GUI, but was originally devised as a part of #543, #546, and associated mass_spec case https://github.com/calico/mass_spec/issues/752

PMSeitzer commented 2 years ago

Return to this after starting mass_spc case https://github.com/calico/mass_spec/issues/752

PMSeitzer commented 2 years ago

Implement spectral entropy score

paper: https://www.nature.com/articles/s41592-021-01331-z

source code: https://github.com/YuanyueLi/SpectralEntropy/blob/master/spectral_entropy/spectral_entropy.py#L26-L28

scipy.stats.entropy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.entropy.html

PMSeitzer commented 2 years ago

more information about various spectral scoring approaches: https://www.biorxiv.org/content/biorxiv/early/2022/06/02/2022.06.01.494370.full.pdf

PMSeitzer commented 2 years ago

Modified cosine score, as it is explained in the original manuscript (https://www.pnas.org/doi/abs/10.1073/pnas.1203689109):

Vector similarities are calculated for every possible pair of spectra with a minimum of six matching fragment ions (i.e., peaks) with similarity determined by using a modified cosine calculation that takes into account the relative intensities of the fragment ions as well as the precursor m/z difference between the paired spectra

This has come to mean something more specific:

Two peaks are considered a potential match if their m/z ratios lie within the given ‘tolerance’, or if their m/z ratios lie within the tolerance once a mass-shift is applied. The mass shift is simply the difference in precursor-m/z between the two spectra.

So, a peak may match another peak after a mass-shift is applied.

See this implementation: https://github.com/matchms/matchms/blob/master/matchms/similarity/ModifiedCosine.py#L109-L129

It looks like they are matching to both the m/z and NL m/z of an observed spectrum (where NL m/z = precursorMz - fragMz).

PMSeitzer commented 4 months ago

Inspired by ASMS 2024, re-opening this case. Introduced now in modified cosine score, flash entropy, kullback-leibler divergence, Jansen-Shanon divergence, etc.

PMSeitzer commented 4 months ago

spectral entropy python: https://github.com/YuanyueLi/SpectralEntropy