Annotated sequence peptide string

jspaezp commented 2 years ago

[feature request]

I was wondering if there is any desire to implement "adding the annotated ions on the peptide sequence string". "P]E]P]TIDEP[I[N[K" kind of deal ...

like it is show in the following image

(in addition it would be great to have precursor ion annotations, let me know if you would like to make a new issue for it/try to implement it)

my_spectrum_top.annotate_peptide_fragments(0.5, 'Da', ion_types='p')

Thanks for the great package! Sebastian

EDIT: Adding precursor ions to annotation is already supported!

bittremieux commented 2 years ago

I've thought previously about trying to add such a peptide string with the fragments indicated, but it's not trivial with matplotlib. I'll have to see how hard it would be to actually implement it.

I'm currently working on some extra peak annotations (neutral losses, modifications via ProForma). Highlighting the precursor peak should definitely be possible, I'll try to add that functionality relatively shortly.

bittremieux commented 2 years ago

To be honest, I don't really have a good idea how to highlight the fragments in the peptide sequence using matplotlib. Suggestions on how to properly combine text with graphical elements are welcome.

pwilmart commented 2 years ago

I don’t know if this might have some useful information:

http://www.aosabook.org/en/matplotlib.html http://www.aosabook.org/en/matplotlib.html

Happy Holidays! -Phil

On Dec 21, 2021, at 3:44 PM, Wout Bittremieux @.***> wrote:

To be honest, I don't really have a good idea how to highlight the fragments in the peptide sequence using matplotlib. Suggestions on how to properly combine text with graphical elements are welcome.

— Reply to this email directly, view it on GitHub https://github.com/bittremieux/spectrum_utils/issues/24#issuecomment-999168960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFZNQSUTQ7B5JEUGHWQZKNLUSEGOTANCNFSM5CNBUIZQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.

Seb-Leb commented 2 years ago

I have implemented something like this by generating an svg with drawSvg and sticking it on to the plot. Its not always the most elegant, but it works..

bittremieux commented 2 years ago

Thanks for the tip. A matplotlib option would be preferable, but I don't know how to properly do it that way. If drawSvg works it could be worth looking into.

hkmoon commented 2 years ago

In https://github.com/Rappsilber-Laboratory/xiSPEC_spectrumViewer/ project, there is a way to draw fragments. It requires XiAnnotator(https://github.com/Rappsilber-Laboratory/pyXiAnnotator) for drawing it additionally. This is based on d3.js in javascript.

e.g. https://github.com/Rappsilber-Laboratory/xiSPEC_spectrumViewer/blob/master/example_linear.html

VHTECcarbamidomethylCcarbamidomethylHGDLLECcarbamidomethylADDRADLAK_z=3xispec_Svg0

It might be useful how to draw peptide letters: https://github.com/Rappsilber-Laboratory/xiSPEC_spectrumViewer/blob/cceb202b13d9361fe5f208fbf7f7985d5c3a7277/src/FragmentationKeyView.js#L293

mobiusklein commented 1 year ago

I did not want to work on my own work and saw this thread and thought I hadn't abused matplotlib in a while. Here's a Python implementation that is scale invariant while using the spectrum's own coordinate system. I sparingly use ms_deisotope's functionality as that is what I implemented this with, but these should be easily converted to use the spectrum representation that spectrum_utils uses.

from typing import Optional, Dict, Any, NamedTuple, Sequence, List
from ms_deisotope import spectrum_graph

import numpy as np

from matplotlib import path as mpath, patches as mpatch, transforms as mtransform, text as mtext, axes as maxes

class Peak(NamedTuple):
    mz: float
    intensity: float
    charge: int

class PeakNode(NamedTuple):
    peak: Peak

class PeakPathEdge(NamedTuple):
    start: PeakNode
    end: PeakNode
    annotation: str

class PeakPath(Sequence[PeakPathEdge]):
    pass

def bbox_path(path):
    nodes = path.vertices
    xmin = nodes[:, 0].min()
    xmax = nodes[:, 0].max()
    ymin = nodes[:, 1].min()
    ymax = nodes[:, 1].max()
    return (xmin, ymin, xmax, ymax)

def shift(path, x=0, y=0):
    return path.transformed(mtransform.Affine2D().translate(x, y))

def draw_ladder(ax: maxes.Axes,
                peak_path: PeakPath,
                scan,
                peak_line_options: Optional[Dict[str, Any]]=None,
                seq_line_options: Optional[Dict[str, Any]]=None,
                text_prop: Optional[mtext.FontProperties]=None,
                vertical_shift: float=0.0):

    if peak_line_options is None:
        peak_line_options = {}
    if seq_line_options is None:
        seq_line_options = {}

    peak_line_options.setdefault('linewidth', 1)
    peak_line_options.setdefault('color', 'red')
    seq_line_options.setdefault('linewidth', 2)
    seq_line_options.setdefault('color', 'red')

    # Compute the maximum height of peaks in the region to be annotated
    # so that there is no overlap with existing peaks
    upper = (
        max(
            [
                p.intensity
                for p in scan.deconvoluted_peak_set.between(
                    peak_path[0].start.peak.mz, peak_path[-1].end.peak.mz, use_mz=True
                )
            ]
        )
        * 1.2
    ) + vertical_shift

    # Compute the x-axis dimension aspect
    xlim = (min(p.mz for p in scan.deconvoluted_peak_set),
            max(p.mz for p in scan.deconvoluted_peak_set))
    # Compute the y-axis dimension aspect
    ylim = (0, scan.base_peak.deconvoluted().intensity)

    # Create an baseline scaling transformation for the text
    base_trans = mtransform.Affine2D()
    base_trans.scale((xlim[1] - xlim[0]) / 75, (ylim[1] - ylim[0]) / 25)

    # Don't try to annotate a ladder that changes charge state that
    # would involve back-tracking on the x-axis
    start_charge = None
    for i, edge in enumerate(peak_path):
        if start_charge is None:
            start_charge = edge.start.peak.charge

        # We'll write the annotation glyph in the middle of the gap
        mid = (edge.start.peak.mz + edge.end.peak.mz) / 2

        # Create the glyph(s) for the peak pair annotation at unit-scale
        # and then scale it up using the base transformation, then center it
        # at 0 again.
        tpath = mtext.TextPath((0, 0), edge.annotation, 1, prop=text_prop)
        tpath = (tpath.transformed(base_trans))

        if (
            edge.start.peak.charge != start_charge
            or edge.end.peak.charge != start_charge
        ):
            continue

        # Move the annotation glyph(s) to the midpoint between the two peaks
        # at the sequence line.
        tpath = shift(tpath, mid, upper * 0.99)

        # Check whether our annotation glyph(s) is too wide for the gap between
        # the two peaks. If it is too large, draw it above the main line to avoid
        # over-plotting.
        xmin, ymin, xmax, ymax = bbox_path(tpath)
        shift_up = (xmax - xmin) / (edge.end.peak.mz - edge.start.peak.mz) > 0.3
        if shift_up:
            tpath = shift(tpath, 0, ymax - ymin)

        ax.add_patch(mpatch.PathPatch(tpath, color="black"))

        # If this is the first peak pair, draw the starting point vertical
        # peak line.
        if i == 0:
            ax.plot(
                [edge.start.peak.mz, edge.start.peak.mz],
                [upper, edge.start.peak.intensity + ylim[1] * 0.05],
                **peak_line_options
            )

        # Draw the next vertical peak line
        ax.plot(
            [edge.end.peak.mz, edge.end.peak.mz],
            [upper, edge.end.peak.intensity + ylim[1] * 0.05],
            **peak_line_options
        )

        # Draw the horizontal line between the two peaks. If the annotation
        # was shifted up, draw a single horizontal line connecting the two
        # peaks.
        if shift_up:
            ax.plot(
                [edge.start.peak.mz, edge.end.peak.mz],
                [upper, upper],
                **seq_line_options
            )
        else:
            # Otherwise, draw a line from the starting peak to the annotation glyph
            # with some padding.
            ax.plot(
                [edge.start.peak.mz, max(xmin - xlim[1] * 0.01, edge.start.peak.mz)],
                [upper, upper],
                **seq_line_options
            )
            # And then draw another line from the other side of the glyph with some
            # padding to the second peak.
            ax.plot(
                [min(xmax + xlim[1] * 0.01, edge.end.peak.mz), edge.end.peak.mz],
                [upper, upper],
                **seq_line_options
            )

Here are some examples. I have a match object that knows how to draw a (glyco)peptide spectrum match, and a list of PeakPath objects called paths which I'll draw a random selection from:

art = match.plot()
draw_ladder(
    art.ax,
    paths[0],
    match.scan,
    peak_line_options={"color": "blue", "linestyle": "--"},
    seq_line_options={"color": "blue"},
)

draw_ladder(
    art.ax,
    paths[7],
    match.scan,
    peak_line_options={"color": "red", "linestyle": "--"},
    seq_line_options={"color": "red"},
)

Another, different, spectrum to show it's not magic-numbered for the first one:

art = match2.plot()
draw_ladder(art.ax, paths2[0], match2.scan, peak_line_options={"linestyle": '--'})

And then normalizing the intensity scale for the second spectrum:

Some notes on the implementation:

I do not attempt to deal with paths which change charge states part way through. This type of annotation is simply un-suited to it.
I still use some pseudo-magic numbers when setting up the base scaling transform. Increase the x-portion of the transform to make the letters wider, but you probably won't want to do that because if they are too wide they don't fit in the space provided, especially the low mass amino acids like G and A.
I use an object model that needs to be told about peaks which doesn't quite match what the original request is. I do not think adding control characters to a string is the ideal way specify this when you might have anything in the peptide sequence specification.

Since I didn't directly implement this against spectrum_utils and we disagree over input signatures, does this motivate anyone to talk about how would be most convenient to signify the ladder path you want to draw so this could go from extra long comment to a PR?

bittremieux commented 1 year ago

Thanks Joshua! This is a great start to the discussion. I'll be out of town until the end of the month, but if you're up for it, we should discuss this in more detail in November.

mobiusklein commented 1 year ago

Sure. I may have refined this a bit by then for figure making.

bittremieux commented 1 year ago

I'm trying to come up with something and would like to get your opinions.

quickstart

I think this is more or less what was requested. (As a next step we can try to add the ladder as well, but first I want to get this functionality figured out.)

A few questions:

The sequence is the amino acids only, no modifications because those would lead to random residue widths. This could be considered a bit misleading/confusing though. Is this acceptable or do people have recommendations on how to optimize this?
Currently I only indicate fragments for singly-charged and canonical fragments. Should additional charge states, losses, etc., be considered as well and should that somehow be reflected in the annotated peptide sequence?
I suggest to only allow a single of a/b/c (downward facing) and x/y/z (upward facing) fragment indications, otherwise it would get pretty messy. Thoughts?

@jspaezp @mobiusklein @pwilmart @wfondrie @RalfG

RalfG commented 1 year ago

This looks great!

Modifications: Perhaps the amino acid letters could be colored to indicate a modification, potentially using different colors for different modifications?
Other ion types: I think ideally it should be an option for additional backbone ions to be considered. Not sure if a distinction should be made between simple singly charged ions and other ion types for visualization. Perhaps a dotted/dashed line instead of a solid line if only 'more exotic' ions are present?
Indications: If I understand correctly, you mean adding multiple lines between amino acids indicating different ion types (a/b/c or x/y/z)? Here I would opt for visual simplicity and 'summarize' the presence of multiple ion types for a single ion number (e.g. a2 and y2) into a single line between the amino acids. In the end, the goal of this visualization is to know for which amino acid ranges there is evidence in the spectrum, regardless of ion types.

jspaezp commented 1 year ago

The sequence is the amino acids only, no modifications because those would lead to random residue widths. This could be considered a bit misleading/confusing though. Is this acceptable or do people have recommendations on how to optimize this?

I agree that explicit mods might not be a first priority and I would agree with Ralf's suggestion of color-coding modified AAs would be a good way to convey mod information without changing the 'monospaced' nature of the annotation.

Currently I only indicate fragments for singly-charged and canonical fragments. Should additional charge states, losses, etc., be considered as well and should that somehow be reflected in the annotated peptide sequence?

I would argue that it should annotate higher charge states but I don't feel like there is the need to have them show separately. (It could use whatever the fragment charge state is set by max_ion_charge ref: https://spectrum-utils.readthedocs.io/en/latest/api.html#spectrum_utils.spectrum.MsmsSpectrum.annotate_proforma)

I suggest to only allow a single of a/b/c (downward facing) and x/y/z (upward facing) fragment indications, otherwise it would get pretty messy. Thoughts?

I think this would fit 95+% use cases, so I do not feel like it would be totally required in a first implementation. But just as a reference and inspiration but I have seen in the past offsetting the annotation in the 'y' axis and color coding it as a way to annotate multiple fragment series. (fig 8 https://pubs.acs.org/doi/pdf/10.1021/jasms.2c00214)

I think as it is right now looks great! good job!

mobiusklein commented 1 year ago

It's very easy to make these graphics too noisy trying to include all that information, and choosing what to draw and when to draw it is going to be application specific.

The sequence is the amino acids only, no modifications because those would lead to random residue widths. This could be considered a bit misleading/confusing though. Is this acceptable or do people have recommendations on how to optimize this?

I agree, coloring the modified amino acid is the usual thing done. Sometimes it's also written lowercase, but color-coding is usually more useful, especially if you keep it uniform across multiple plots.

Currently I only indicate fragments for singly-charged and canonical fragments. Should additional charge states, losses, etc., be considered as well and should that somehow be reflected in the annotated peptide sequence?

This gets into interfaces. If you provide sane defaults for your high level interface (e.g. all charge states, no neutral losses), that's reasonable, and if you want to expose a lower level interface, that's where you might either add a bunch of flag combinators or flat out just accept a predicate function for (peak, annotation) -> bool on whether an annotation adds a bar and/or a predicate to transform (annotations_at: list) -> str to control how the ion series label is drawn, e.g. so you can abuse unicode or LaTeX to decorate the fragment name. This type of inversion of control might clash with your existing interface and complicate your code undesirably.

I suggest to only allow a single of a/b/c (downward facing) and x/y/z (upward facing) fragment indications, otherwise it would get pretty messy. Thoughts?

That's reasonable. Again, agreeing with Ralf that what we care about is "this bond was observed to break". If you're doing fancy modification localization, then you might want to draw all the ion ladders separately. For my implementation, I stack labile modified and unmodified annotations using different color slash marks over the same bond, but it only goes two deep because more is not helpful and may get too messy unless scale is carefully controlled.

Some implementation questions are:

How does the sequence logo interact with the matplotlib.Axes your spectrum is plotted on. Is it the same Axes and you are exploiting transforms to keep scale under control, or did you use Figure.add_axes with a bounding box to carve out a sub-figure? Is that region a parameter?
Is this drawn with Text or TextPath? They interact with axis limits differently.
Does this depend upon a specific font family?
What happens to the sequence graphic if the sequence is long? 20 AA? 30 AA?
- What happens to the size per AA?
- What happens to the spacing around/between annotations? Do they overlap each other?

bittremieux commented 1 year ago

Very relevant questions, and I haven't settled on an optimal implementation yet.

Currently I provide additional parameters to the plot method to add the peptide sequence to an existing Axes/spectrum plot. Alternatively I could piggyback on the facet plot introduced in #46 to use an extra axis. I see advantages for both options. In the former situation, the peptide sequence can be moved everywhere in the figure, providing more customizable figures. In contrast, the latter situation might provide a cleaner API by extracting the peptide sequence plotting functionality as a separate method.

Then we get into specific implementation details. I think it will be pretty cumbersome or even impossible to find a "perfect" solution that works in all situations. It's probably much easier to have a good-enough implementation that works most of the time. And then try to provide relevant parameters to customize aspects of the plot so that advanced users can try to fix those edge cases themselves.

RalfG commented 12 months ago

Currently I provide additional parameters to the plot method to add the peptide sequence to an existing Axes/spectrum plot. Alternatively I could piggyback on the facet plot introduced in https://github.com/bittremieux/spectrum_utils/pull/46 to use an extra axis.

I guess the current method also allows it to be used in a facet if it receives an empty axis to plot onto? That would seem the most flexible approach to me; where it can plot on top of an existing axis, potentially with a spectrum/mirror plot, or onto an empty exis, which could be part of a larger facet plot.

bittremieux / spectrum_utils

Annotated sequence peptide string #24