dzhang32 / ggtranscript

Visualizing transcript structure and annotation using ggplot2
https://dzhang32.github.io/ggtranscript/
Other
130 stars 9 forks source link

labeling of exons (junctions?) #3

Closed gpertea closed 2 years ago

gpertea commented 2 years ago

For transcripts with many exons it would useful to have the option to display the exon order numbers inside the exon (or above/below when the exon height is variable or too small?).

Perhaps a dedicated boolean option to just enable/disable the automatic drawing of exon order numbers for each transcript, with another option for its placement?

A more generic solution would be mapping such exon labels to some GTF exon attribute, like cov or exon_number as found in StringTie output -- maybe a label option can be added to geom_range() or its aesthetics. However in many cases the exon_number attribute is missing so a helper function could be added to generate that automatically in that case..

As for labeling junctions, I suppose a labeling option could be added to geom_junction() to enable showing the numeric coverage values (supporting reads) for each junction, above the junction curve for top curves, or below for bottom ones.

dzhang32 commented 2 years ago

I really like the idea of labelling geom_ranges. Not only would this be useful for exons, I can imagine it coming in handy for other use cases e.g. labelling to_diff() (or one day, to_jdiff()) outputs with their width to see if transcript changes remain in frame. With this in mind, I find the generic solution more appealing, with a helper function to facilitate the exon number use case.

Thinking this through, I think such a label could currently be achieved by adding something like e.g. ggplot2::geom_label(aes(x = (start + end / 2), label = exon_number)). I wonder if it is worth adding a label parameter to geom_range() or instead, show in the vignette/example the combined use of geom_label and geom_range? Personally favour the latter, as it gives users more flexibility and maybe more ggplot2-esque - would appreciate your thoughts.

For the junction case, I do think a label parameter makes sense. The difference from geom_range being that users currently cannot easily add a label to the centre of the junction line as this requires knowing the points of the curve (in particular the y values). Implementation-wise, I think I would need to rewrite GeomJunction to inherit from GeomPath rather than GeomCurve to allow manipulation of the curve prior to creation of the grob - I will give this a go.

Thank you for your feedback, super helpful @gpertea!

gpertea commented 2 years ago

Thank you -- using geom_label (with geom_range) sounds like a good solution for labeling exons, I did not realize it could be that simple. I guess in that case the only thing left for exons would be to make sure the exon_ number column can be generated with a helper function if the attribute is missing.

dzhang32 commented 2 years ago

I've added both the exon number helper function and method for labelling junctions.

For the junction label, I went for a separate label geom that inherits from ggrepel::geom_label_repel. The reason I chose this option over a label parameter inside geom_junction was because I think this approach would give users more flexibility in deciding the label aesthetics. The downside of this approach is that costs more computationally (as we have to generate junction curves twice - once for junctions, another for junction labels), but there is scope to optimise my implementation if speed does become a bottleneck.

One thing I was considering is whether to provide a helper function (pretty much what is used internally by geom_junction_label_repel) to obtain the midpoints of junction curves. This would enable users to e.g. use ggplot2::geom_text or ggplot2::geom_label instead of ggrepel::geom_label_repel if they desired. Out of simplicity, I've held back on this for now with the idea to return to it if any users requested - would you find this helper useful?

Let me know if you have any additional thoughts regarding the above - thank you!

gpertea commented 2 years ago

Thank you for the detailed work on this and the documentation, the examples are great! A lot of work, really appreciated.