Open rdstern opened 4 years ago
@rdstern what layer parameters should have for stat_cor()
and for stat_regline_equation
Just like stat_peaks and stat_valleys I assume. the aesthetics are (at least initially) just the x and y. Unless they are changed (and they usually are not, then they don't appear in the code. Then the rest - again like stat_peaks and stat_valleys are in the geom parameters.
The geoms a) the geom is just label or text. Default is text. b) x-label (gives the x position and geom label.x = number. So the field just allows a number to be typed. (Not an up-down, but only numeric. Negative numbers allowed.) c) Same for y-label d) x-position (gives label.x.npc) Can this be a combo box into which you can type. Values are "left", "centre", "right" 1, 0.5, 0 as you can give a number from 0 to 1 or one of these texts. e) y-position - same for label.y.npc - with "top", "centre", "bottom". f) then the position needs all the options of the text geom. g) Then size is as for text h) colour is as for text i) alpha is as for text These are for both of them.
This should take the facilities on another step. Then the aspect that is a bit beyond me is how to adjust what is presented. In particular, for the regression equation the object has more components that we could present. They are an aesthetic, but from the object produced, and not from the data. I hope that @dannyparsons can advise on what we could do. I'll check with him. I am also not sure where the formula can fit within our system!
The code below is from the ggpmisc package example and runs in the script window. I have suggested above the equivalent dialogue from ggpubr, but I am thinking perhaps we should use the version - stat_poly_eq from the ggpmisc instead. (We will still need the correlation stat from ggpubr)
Here is the code that is from the guide and runs fine in R-Instat:
# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x, y = y,group = c("A", "B"),y2 = y * c(0.5,2),w = sqrt(x))
# give a name to a formula
formula <- y ~ poly(x, 3, raw = TRUE)
# user specified label
ggplot(my.data, aes(x, y)) + geom_point() + geom_smooth(method = "lm", formula = formula) + ggpmisc::stat_poly_eq(aes(label = paste(stat(eq.label), stat(adj.rr.label), sep = "~~~~")), formula = formula, parse = TRUE)
This gives the following graph:
The 2 aspects I don't know how to include in our menu system is the function above and also the aesthetic where yuo can choose the precise label that will be presented.
The 2 aspects I don't know how to include in our menu system is the function above and also the aesthetic where yuo can choose the precise label that will be presented.
@rdstern two questions:
ggpmisc::stat_poly_eq
function?Edit: I have had a look on the ggpmisc
function and put details under issue #6739
The 2 stats for stat_cor and stat_regline_equation have now been added. They are from the ggpubr package. I did ask a question there, on how to give the correlation information without the p value. This is here. It was answered, and then closed. I wonder if we can add that feature to our implementation of the stat. I tried, but failed, to do that with the script. That needs someone with better R than I have.
I note there are also some options there that give an error. For example label.x and label.y only allow null. They are numeric, and NULL gives an error. The default geom is text, and it would be good to add label.
@lilyclements we now have stat_cor and stat_regline_equation in the list of geoms, and they seem to work. Is it possible to add to the parameters?
Currently y~x is the only option for the formula. Could we add more examples? How exactly would this work?
In stat_cor there also seems to be flexibility in what is presented. Can this be included, at least partially, in these options. Or are there possibilities to "tweak" commands using the script window.
And should we add any further stats from the ggpubr package. And can we add some from ggpmisc in the same way?
Adding more Parameters
In the help file for both stat_cor
and stat_regline_equation
it says that we can add additional parameters:
"other arguments to pass to geom_text or geom_label."
These are:
parse = FALSE,
nudge_x = 0,
nudge_y = 0,
label.padding = unit(0.25, "lines"),
label.r = unit(0.15, "lines"),
label.size = 0.25,
check_overlap = FALSE,
However, when I try them out, I get a warning message for both functions stating that it si ignoring these parameters. E.g.:
"Ignoring unknown parameters: label.size
".
I suggest we ignore this for the time being?
Formula options
From the help file, we can add polynomial equations through the poly
function. For a third-order polynomial we would have: y ~ poly(x, degree = 3, raw = TRUE)
.
This would give, for example:
# Set up data
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x, y, group = c("A", "B"), y2 = y * c(0.5,2), block = c("a", "a", "b", "b"))
# Fit polynomial regression line and add labels
formula <- y ~ poly(x, 3, raw = TRUE)
p <- ggplot(my.data, aes(x, y2, color = group)) +
geom_point() +
stat_smooth(aes(fill = group, color = group), method = "lm", formula = formula) +
stat_regline_equation(
aes(label = paste(..eq.label.., ..adj.rr.label.., sep = "~~~~")),
formula = formula) +
theme_bw()
ggpar(p, palette = "jco")
So perhaps having poly(x, 2, raw = TRUE)
and poly(x, 3, raw = TRUE)
could be two additions? It should be reasonably straight forward to add in.
Further stats from ggpubr
Looking here we can see the functions in ggpubr
.
The bits I think we can avoid
ggboxplot()
, ggdensity()
), or that we offer if you do multiple geoms and build up the levels (e.g., ggpaireddata()
), but others might be useful to look into.stat_compare_means()
, stat_anova_pvalues()
, etc. But I feel this might be putting the emphasis on the wrong thing (i.e., just on the p-value). gt
for our tables. The bits I think we can consider!
I assume this may be relatively easy for either Ivan or Wyclife to add? Here is an example of a graph with summaries added. This is something we have needed for a long time.
This uses the ggpubr package that is already installed. It uses
stat_cor
for the correlation andstat_regline_equation
for the regression equation.They can now be added simply to the script for a scatteplot (or line plot) as I did below, to get the graph above.
last_graph <- ggplot2::ggplot(data=Sadore, mapping=ggplot2::aes(x=Tmx, y=Tmn)) + ggplot2::geom_point() + theme_grey()+ggpubr::stat_cor()+ggpubr::stat_regline_equation(label.y=30)
We already have 4 stats in the list when layers are added to a ggplot, so this is adding 2 more. A simple example for testing could be the survey dataset from the Introductory guide (in the library) and plotting yiled against fert. With facets this could be plotted for each village or each variety.a) For future reference I note that the documentation for the regression line equations says it was inspired by a function in the ggpmisc package (that is also installed). Later there will be other stats and geoms to add from there, particularly in relation to the broom package. But for now these 2 stats should be simple to add and would be very useful. b) In the figure above it should be possible to adapt the displays, once the stats have been implemented. In particular I would like to be able to give the correlation without the significance level. And, with the regression line it should be possible to add further summaries - but I failed, when adding them in the script file. This can be investigated once the stats are installed.