Closed gpertea closed 2 years ago
I really like the idea of labelling geom_range
s. Not only would this be useful for exons, I can imagine it coming in handy for other use cases e.g. labelling to_diff()
(or one day, to_jdiff()
) outputs with their width to see if transcript changes remain in frame. With this in mind, I find the generic solution more appealing, with a helper function to facilitate the exon number use case.
Thinking this through, I think such a label could currently be achieved by adding something like e.g. ggplot2::geom_label(aes(x = (start + end / 2), label = exon_number))
. I wonder if it is worth adding a label
parameter to geom_range()
or instead, show in the vignette/example the combined use of geom_label
and geom_range
? Personally favour the latter, as it gives users more flexibility and maybe more ggplot2
-esque - would appreciate your thoughts.
For the junction case, I do think a label
parameter makes sense. The difference from geom_range
being that users currently cannot easily add a label to the centre of the junction line as this requires knowing the points of the curve (in particular the y
values). Implementation-wise, I think I would need to rewrite GeomJunction
to inherit from GeomPath
rather than GeomCurve
to allow manipulation of the curve prior to creation of the grob
- I will give this a go.
Thank you for your feedback, super helpful @gpertea!
Thank you -- using geom_label
(with geom_range
) sounds like a good solution for labeling exons, I did not realize it could be that simple. I guess in that case the only thing left for exons would be to make sure the exon_ number
column can be generated with a helper function if the attribute is missing.
I've added both the exon number helper function and method for labelling junctions.
For the junction label, I went for a separate label geom that inherits from ggrepel::geom_label_repel
. The reason I chose this option over a label
parameter inside geom_junction
was because I think this approach would give users more flexibility in deciding the label aesthetics. The downside of this approach is that costs more computationally (as we have to generate junction curves twice - once for junctions, another for junction labels), but there is scope to optimise my implementation if speed does become a bottleneck.
One thing I was considering is whether to provide a helper function (pretty much what is used internally by geom_junction_label_repel
) to obtain the midpoints of junction curves. This would enable users to e.g. use ggplot2::geom_text
or ggplot2::geom_label
instead of ggrepel::geom_label_repel
if they desired. Out of simplicity, I've held back on this for now with the idea to return to it if any users requested - would you find this helper useful?
Let me know if you have any additional thoughts regarding the above - thank you!
Thank you for the detailed work on this and the documentation, the examples are great! A lot of work, really appreciated.
For transcripts with many exons it would useful to have the option to display the exon order numbers inside the exon (or above/below when the exon height is variable or too small?).
Perhaps a dedicated boolean option to just enable/disable the automatic drawing of exon order numbers for each transcript, with another option for its placement?
A more generic solution would be mapping such exon labels to some GTF exon attribute, like
cov
orexon_number
as found in StringTie output -- maybe alabel
option can be added togeom_range()
or its aesthetics. However in many cases theexon_number
attribute is missing so a helper function could be added to generate that automatically in that case..As for labeling junctions, I suppose a labeling option could be added to
geom_junction()
to enable showing the numeric coverage values (supporting reads) for each junction, above the junction curve for top curves, or below for bottom ones.