IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Add 2 more stats to the list of stats and geoms for ggplot2 #5744

Open rdstern opened 4 years ago

rdstern commented 4 years ago

I assume this may be relatively easy for either Ivan or Wyclife to add? Here is an example of a graph with summaries added. This is something we have needed for a long time.

image

This uses the ggpubr package that is already installed. It uses stat_cor for the correlation and stat_regline_equation for the regression equation.

They can now be added simply to the script for a scatteplot (or line plot) as I did below, to get the graph above.

last_graph <- ggplot2::ggplot(data=Sadore, mapping=ggplot2::aes(x=Tmx, y=Tmn)) + ggplot2::geom_point() + theme_grey()+ggpubr::stat_cor()+ggpubr::stat_regline_equation(label.y=30) We already have 4 stats in the list when layers are added to a ggplot, so this is adding 2 more. A simple example for testing could be the survey dataset from the Introductory guide (in the library) and plotting yiled against fert. With facets this could be plotted for each village or each variety.

a) For future reference I note that the documentation for the regression line equations says it was inspired by a function in the ggpmisc package (that is also installed). Later there will be other stats and geoms to add from there, particularly in relation to the broom package. But for now these 2 stats should be simple to add and would be very useful. b) In the figure above it should be possible to adapt the displays, once the stats have been implemented. In particular I would like to be able to give the correlation without the significance level. And, with the regression line it should be possible to add further summaries - but I failed, when adding them in the script file. This can be investigated once the stats are installed.

Ivanluv commented 4 years ago

@rdstern what layer parameters should have for stat_cor() and for stat_regline_equation

rdstern commented 4 years ago

Just like stat_peaks and stat_valleys I assume. the aesthetics are (at least initially) just the x and y. Unless they are changed (and they usually are not, then they don't appear in the code. Then the rest - again like stat_peaks and stat_valleys are in the geom parameters.

rdstern commented 4 years ago

The geoms a) the geom is just label or text. Default is text. b) x-label (gives the x position and geom label.x = number. So the field just allows a number to be typed. (Not an up-down, but only numeric. Negative numbers allowed.) c) Same for y-label d) x-position (gives label.x.npc) Can this be a combo box into which you can type. Values are "left", "centre", "right" 1, 0.5, 0 as you can give a number from 0 to 1 or one of these texts. e) y-position - same for label.y.npc - with "top", "centre", "bottom". f) then the position needs all the options of the text geom. g) Then size is as for text h) colour is as for text i) alpha is as for text These are for both of them.

This should take the facilities on another step. Then the aspect that is a bit beyond me is how to adjust what is presented. In particular, for the regression equation the object has more components that we could present. They are an aesthetic, but from the object produced, and not from the data. I hope that @dannyparsons can advise on what we could do. I'll check with him. I am also not sure where the formula can fit within our system!

rdstern commented 4 years ago

The code below is from the ggpmisc package example and runs in the script window. I have suggested above the equivalent dialogue from ggpubr, but I am thinking perhaps we should use the version - stat_poly_eq from the ggpmisc instead. (We will still need the correlation stat from ggpubr)

Here is the code that is from the guide and runs fine in R-Instat:

# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x, y = y,group = c("A", "B"),y2 = y * c(0.5,2),w = sqrt(x))

# give a name to a formula
formula <- y ~ poly(x, 3, raw = TRUE)

# user specified label
ggplot(my.data, aes(x, y)) + geom_point() + geom_smooth(method = "lm", formula = formula) + ggpmisc::stat_poly_eq(aes(label = paste(stat(eq.label), stat(adj.rr.label), sep = "~~~~")), formula = formula, parse = TRUE)

This gives the following graph:

image

The 2 aspects I don't know how to include in our menu system is the function above and also the aesthetic where yuo can choose the precise label that will be presented.

lilyclements commented 3 years ago

The 2 aspects I don't know how to include in our menu system is the function above and also the aesthetic where yuo can choose the precise label that will be presented.

@rdstern two questions:

  1. By "the function above", do you mean the ggpmisc::stat_poly_eq function?
  2. Is this something you would want to be added into the correlation bits, as a new dialog, or elsewhere?

Edit: I have had a look on the ggpmisc function and put details under issue #6739

rdstern commented 2 years ago

The 2 stats for stat_cor and stat_regline_equation have now been added. They are from the ggpubr package. I did ask a question there, on how to give the correlation information without the p value. This is here. It was answered, and then closed. I wonder if we can add that feature to our implementation of the stat. I tried, but failed, to do that with the script. That needs someone with better R than I have.

I note there are also some options there that give an error. For example label.x and label.y only allow null. They are numeric, and NULL gives an error. The default geom is text, and it would be good to add label.

rdstern commented 2 years ago

@lilyclements we now have stat_cor and stat_regline_equation in the list of geoms, and they seem to work. Is it possible to add to the parameters?

image

Currently y~x is the only option for the formula. Could we add more examples? How exactly would this work?
In stat_cor there also seems to be flexibility in what is presented. Can this be included, at least partially, in these options. Or are there possibilities to "tweak" commands using the script window.

And should we add any further stats from the ggpubr package. And can we add some from ggpmisc in the same way?

lilyclements commented 1 year ago

Adding more Parameters

In the help file for both stat_cor and stat_regline_equation it says that we can add additional parameters:

"other arguments to pass to geom_text or geom_label."

These are:

  parse = FALSE,
  nudge_x = 0,
  nudge_y = 0,
  label.padding = unit(0.25, "lines"),
  label.r = unit(0.15, "lines"),
  label.size = 0.25,
  check_overlap = FALSE,

However, when I try them out, I get a warning message for both functions stating that it si ignoring these parameters. E.g.: "Ignoring unknown parameters: label.size".

I suggest we ignore this for the time being?

Formula options From the help file, we can add polynomial equations through the poly function. For a third-order polynomial we would have: y ~ poly(x, degree = 3, raw = TRUE).

This would give, for example:

# Set up data
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x, y, group = c("A", "B"), y2 = y * c(0.5,2), block = c("a", "a", "b", "b"))

# Fit polynomial regression line and add labels
formula <- y ~ poly(x, 3, raw = TRUE)
p <- ggplot(my.data, aes(x, y2, color = group)) +
  geom_point() +
  stat_smooth(aes(fill = group, color = group), method = "lm", formula = formula) +
  stat_regline_equation(
    aes(label =  paste(..eq.label.., ..adj.rr.label.., sep = "~~~~")),
    formula = formula) +
  theme_bw()
ggpar(p, palette = "jco")

So perhaps having poly(x, 2, raw = TRUE) and poly(x, 3, raw = TRUE) could be two additions? It should be reasonably straight forward to add in.

Further stats from ggpubr

Looking here we can see the functions in ggpubr.

The bits I think we can avoid

The bits I think we can consider!