Create curved text paths in ggplot2
Keywords for placement of label #34

teunbrand commented 2 years ago

We were discussing in #27 that it might be convenient to have labels placed at some position. In particular, we were discussing hjust = "auto" for placing the label at the flattest part of the curve, but that got me thinking about other placement rules. I think the following keywords for the hjust parameter make sense:

This is not an exhaustive list, but these came to mind.

AllanCameron commented 2 years ago

This might be a nice touch. Though "flat" isn't the opposite of "steep" in this context. It's the opposite of "curved", so for example, in your economics example, the peaks are steep but they are also the "flattest" regions for text to be placed.

teunbrand commented 2 years ago

You're right, I was thinking about horizontal and vertical instead, which also might be neat options, but it is indeed not necessarily the flattest part. Your comment about using the curvature for placing the label on the flattest part makes way more sense to me now.

AllanCameron commented 2 years ago

I've made some progress with the automatic label placement on the flattest areas of a plot, using a modified rolling mean of curvature which finds the least curved section and sets the hjust as the proportion of the arclength at that point. I have created a little function that processes the data frame from inside geom textpath, but this could maybe be called from inside textpathGrob and made more efficient (since it uses the split-apply-bind method). Really just a proof of concept at the moment, but it seems to work pretty well:

#> Loading required package: ggplot2

df <- data.frame(x = 1:100, y = cos(seq(0, 2 * pi, len = 100)),
                 label = "A text label of moderate length.")

ggplot(df, aes(x, y, label = label)) + geom_textpath()

ggplot(df, aes(x, y, label = label)) + geom_textpath(hjust = "auto")


df <- data.frame(x = rnorm(100), y = rnorm(100))

ggplot(df, aes(x, y)) + geom_labeldensity2d()

ggplot(df, aes(x, y)) + geom_labeldensity2d(hjust = "auto")

It even finds a place for your label in the difficult economics example:

p <- ggplot(economics, aes(date, unemploy)) +
  geom_path(colour = "grey")
p + geom_textpath(
    aes(label = "Decline", group = 1),
    hjust = "auto", size = 5, include_line = FALSE)

teunbrand commented 2 years ago

Yes that does seems to work pretty good! I don't really worry about efficiency outside of the makeContent code as it doesn't need to run every time the user resizes their window (but all else being equal, more efficiency is better than less efficiency). The only reason I can see to run this from within the makeContent code is because then we can know the exact text width the choose an optimal window for calculating the running mean and get the appropriate curvature.

I tried testing whether the point of minimum curvature is stable under aspect ratio deformation, but this appear to be not the case.


# Random walk
x <- cumsum(rnorm(200))
y <- cumsum(rnorm(200))
plot(x, y, type = 'l')

# Aspect ratios to test
asp <- seq(1, 5, length.out = 100)

# Calculate curvature for every ratio
curv <- vapply(asp, function(mult) {
  geomtextpath:::.get_curvature(x * mult, y)
}, numeric(length(x)))

# Visualise curvature
  list(y = asp, x = 1:200, z = curv),
  useRaster = T, col = hcl.colors(255, "YlOrRd", rev = TRUE)

# Not always minima are the same point
min_curv <- apply(curv, 2, which.min)
all(min_curv == min_curv[1])
#> [1] FALSE

However, there aren't many minima in the example above (just 3) and if you use set.seed(0) there is only a single one, so my guess is that the minimum is relatively stable under deformation? (update I tested 100 seeds and in 37 of them they had 1 minimum).

AllanCameron commented 2 years ago

No, curvature isn't stable under aspect ratio changes - A circle has fixed curvature all the way round, but if you change the aspect ratio you get an ellipse, which has higher curvature in one dimension than the other.

AllanCameron commented 2 years ago

I've moved the auto hjust inside the makeContent mechanism (it's now inside the anchor points function). It seems to work pretty well

#> Loading required package: ggplot2

df <- data.frame(x = rep(sin(seq(0, 2*pi, len = 100)), 2),
                 y = rep(cos(seq(0, 2*pi, len = 100)), 2),
                 z = rep(c("A", "B"), each = 100),
                 label = "I think this is the flattest part of the curve")

p <- ggplot(df, aes(x, y, group = z, label = label)) + 
       geom_textpath(vjust = 1.2, size = 6, hjust = "auto")

p + facet_grid(z~.)

p + facet_grid(.~z)

byteit101 commented 2 years ago

"xmin"/"xmax"/"xmid" for placement at the leftmost/rightmost or middle horizontal position on the curve.

I like these, as they are stable under aspect ratio changes. Could it be generalized for all xpos/ypos? I know right now I have some plots that I have to adjust hjust whenever I resize them, either directly, or indirectly via adding or removing legends, titles, etc. Such an option would be very useful for them.

This is not an exhaustive list, but these came to mind.

A probably tricky-to-implement idea: avoid the other textpaths from the other groups/colors. Something like that would be great for the plot that I used when asking the original question.

AllanCameron commented 2 years ago

I have implemented the positions mentioned above (though "auto" is just "flattest"). I will leave this issue open until we have had a play and some testing. The "check overlap" that @byteit101 mentions is probably a separate issue

#> Loading required package: ggplot2

p <- ggplot(iris, aes(x = Sepal.Length, group = 1))

p + geom_textpath(aes(label = "Default"), stat = "density", size = 6)

p + geom_textpath(aes(label = "auto"), stat = "density", size = 6, 
                  hjust = "auto")

p + geom_textpath(aes(label = "xmin"), stat = "density", size = 6, 
                  hjust = "xmin")

p + geom_textpath(aes(label = "xmid"), stat = "density", size = 6, 
                  hjust = "xmid")

p + geom_textpath(aes(label = "xmax"), stat = "density", size = 6, 
                  hjust = "xmax")

p + geom_textpath(aes(label = "ymin"), stat = "density", size = 6, 
                  hjust = "ymin")

p + geom_textpath(aes(label = "ymid"), stat = "density", size = 6, 
                  hjust = "ymid")

p + geom_textpath(aes(label = "ymax"), stat = "density", size = 6, 
                  hjust = "ymax")

The "ymax" setting is actually pretty useful:

 ggplot(iris, aes(x = Sepal.Length, colour = Species)) +
   geom_textpath(aes(label = Species), stat = "density",
                 size = 6, fontface = 2, hjust = "ymax", vjust = -0.2)


teunbrand commented 2 years ago

This look great! Out of curiosity, in the ymid case, is the left/right choice arbitrary or determined by something? I thought you might like this thread:

AllanCameron commented 2 years ago

Ah...I had noticed that the repo's stars had more than doubled in 24h but couldn't figure out why. Now I know!

The ymid literally finds the point on the path nearest the mean y value.

I can't figure out why the text isn't centered over the peaks on the y max setting. I'll have a look at this and refactor the code (it's unnecessarily repetitive), plus write some tests before closing this issue.

AllanCameron commented 2 years ago

The text wasn't centered over the peak because the default halign was "left", so any vjust below 0.5 pushed the text so it would be in line with the first letter of a string nicely centered on the peak with a vjust of 0.5. I have switched the default to "center", since I am guessing that positioning single-line labels is a more common task than using multi-line labels, and in any case the user can change the halign if printing multi-line text. It seems unreasonable to expect the casual user to know that they should change the halign to correctly position single-line text.

AllanCameron commented 2 years ago

I have added tests for this and we're back at 100% code coverage. The results look as expected on all 3 geoms, so I'll close this issue for now.