bittremieux / spectrum_utils

Python package for efficient mass spectrometry data processing and visualization
https://spectrum-utils.readthedocs.io/
Apache License 2.0
130 stars 21 forks source link

Manual fragment annotation (in version 0.4.2) #56

Closed ch4perone closed 8 months ago

ch4perone commented 8 months ago

Dear developers,

I am using spectrum_utils for visualization and love the package. I recently upgraded the version to 0.4.2 (I believe I was at 3.2).

I was wondering about the intended way to annotate mz fragments (without having a peptide sequence or any other structural annotation). I am using metabolomics data, mostly missing any form of annotation and would like to highlight specific peaks. The MsmsSpectrum function _annotate_mzfragment seems to be removed with the newer version. Prior to this, I was adding a tag to desired peaks, but never managed to highlight these by color. Is there an intended way of singling out peaks by their mz and color code them manually?

Thanks for putting so much work into keeping up the package :)

bittremieux commented 8 months ago

In version 0.4.x we switched to the ProForma specification to annotate peaks to follow established community standards. However, as ProForma has been developed primarily to annotate spectra with peptide sequences, small molecule support is indeed a bit more cumbersome.

One way to annotate a spectrum is to use the X[+9.99] to annotate peaks with specific mass differences (with the 9.99 any numeric value) and hijack the supported ion types:

import matplotlib.pyplot as plt
import seaborn as sns
import spectrum_utils.fragment_annotation as fa
import spectrum_utils.plot as sup
import spectrum_utils.spectrum as sus

# Hijack spectrum_utils to annotate known fragments.
def get_theoretical_fragments(
    proteoform, ion_types=None, max_ion_charge=None, neutral_losses=None
):
    fragments_masses = []
    for mod in proteoform.modifications:
        fragment = fa.FragmentAnnotation(ion_type="w", charge=1)
        mass = mod.source[0].mass
        fragments_masses.append((fragment, mass))
    return fragments_masses

# Use the custom function to annotate the fragments rather than the standard
# peptide-centric method.
fa.get_theoretical_fragments = get_theoretical_fragments
# Include your new custom fragment type.
fa._supported_ions += "w"
# Specify the color for your custom fragment type.
sup.colors["w"] = "#943fa6"

# Load the spectrum the normal way.
usi = "mzspec:MSV000085561:010c:scan:2829"
spec = sus.MsmsSpectrum.from_usi(usi)
# Annotate with ProForma using the `X` format.
spec.annotate_proforma("X[+60.0813]X[+85.0284]X[+144.1019]", 40, "ppm")

# Plot the spectrum.
fig, ax = plt.subplots()

sup.spectrum(spec, grid=False, ax=ax)

sns.despine(ax=ax)

plt.show()
plt.close()

To also annotate the peak with some text, some more tweaking is needed:

# Custom peak annotation functionality.
def annot(annotation, ion_types="w"):
    if annotation.ion_type == "w":
        return "look at me"
    else:
        return ""

# Specify the custom annotation function when plotting the spectrum.
sup.spectrum(spec, grid=False, annot_fmt=annot, ax=ax)

This works, but it's not particularly user-friendly. 😳 I've encountered this issue myself as well (which is why I have the workaround), so I'll have to add a better solution in a future release. If you can let me know your use case, that could help me to understand which functionality would be relevant to add.

ch4perone commented 8 months ago

Thank you very much. The solution works well. I sorted and took the specific mz values (ascending) rather than taking the delta mass (which I couldn't work out). Plot look great :)

x_string = "".join([f"X[+{mz}]" for mz in sorted(mz_fragments)])
spectrum.annotate_proforma(x_string, 40, "ppm")