custom correlation plot and wrap error

ggobi / ggally

R package that extends ggplot2

http://ggobi.github.io/ggally/

588 stars 118 forks source link

custom correlation plot and wrap error #139

Closed schloerke closed 8 years ago

schloerke commented 8 years ago

Question received....

1) Sometimes, the R shows:

> ggpairs(ar,upper=list(params=c(size=10)),lower=list(continuous='smooth'))
Error in display_param_error() : 
  'params' is a deprecated argument.  Please 'wrap' the function to supply arguments. help("wrap", package = "GGally")

Why?

2) I used the following codes to create a correlation matrix for figure 1.

panel.plot <- function(x, y) {
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    ct <- cor.test(x,y)
    sig <- symnum(ct$p.value, corr = FALSE, na = FALSE,
                  cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
                  symbols = c("***", "**", "*", ".", " "))
    r <- ct$estimate
    rt <- format(r, digits=2)[1]
    cex <- 0.5/strwidth(rt)

    text(.5, .5, rt, cex=cex *r)
    text(.8,.8,sig,cex=cex)
}
panel.smooth <- function(x,y) {
  points(x,y,col='blue')
  abline(lm(y~x))
}
ar <- read.table('clipboard',T)
attach(ar)
pairs(ar,upper.panel=panel.plot,lower.panel=panel.smooth)

unnamed

And I used the following ggpairs codes to create figure 2.

ggpairs(ar,lower=list(continuous='smooth'))

unnamed-1

As you can see the font size varies with the size of the correlation coefficient in the 1st figure. I would like to produce something similar with ggpairs. I would also like to remove 'Corr:' and add an indicator of significance behind the coefficient (just like the 3rd figure below).

unnamed-2

Would you please give me some suggestions or codes if possible?

schloerke commented 8 years ago

To remove the warning, please upgrade your code from...

library(ggplot2)
library(GGally)
# using iris, as I do not have the 'ar' dataset

ggpairs(
  iris[,1:4],
  upper = list(params = c(size = 10)),
  lower = list(continuous = 'smooth')
)
#Error in display_param_error() : 
#  'params' is a deprecated argument.  Please 'wrap' the function to supply arguments. help("wrap", package = "GGally")

to...

ggpairs(
  iris[,1:4], 
  upper = list(continuous = wrap(ggally_cor, size = 10)), 
  lower = list(continuous = 'smooth')
)

screen shot 2016-01-28 at 11 34 22 am

To move the axes into the diagonal, use the axisLabels parameter...

ggpairs(
  iris[,1:4], 
  upper = list(continuous = wrap(ggally_cor, size = 10)), 
  lower = list(continuous = 'smooth'), 
  axisLabels = "internal"
)

screen shot 2016-01-28 at 11 35 10 am

Next, we make a custom correlation ggplot2 plot that uses data, mapping, and ... as parameters...

my_custom_cor <- function(data, mapping, color = I("grey50"), sizeRange = c(1, 5), ...) {

  # get the x and y data to use the other code
  x <- eval(mapping$x, data)
  y <- eval(mapping$y, data)

  ct <- cor.test(x,y)
  sig <- symnum(
    ct$p.value, corr = FALSE, na = FALSE,
    cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
    symbols = c("***", "**", "*", ".", " ")
  )

  r <- unname(ct$estimate)
  rt <- format(r, digits=2)[1]

  # since we can't print it to get the strsize, just use the max size range
  cex <- max(sizeRange)

  # helper function to calculate a useable size
  percent_of_range <- function(percent, range) {
    percent * diff(range) + min(range, na.rm = TRUE)
  }

  # plot the cor value
  ggally_text(
    label = as.character(rt), 
    mapping = aes(),
    xP = 0.5, yP = 0.5, 
    size = I(percent_of_range(cex * abs(r), sizeRange)),
    color = color,
    ...
  ) + 
    # add the sig stars
    geom_text(
      aes_string(
        x = 0.8,
        y = 0.8
      ),
      label = sig, 
      size = I(cex),
      color = color,
      ...
    ) + 
    # remove all the background stuff and wrap it with a dashed line
    theme_classic() + 
    theme(
      panel.background = element_rect(
        color = color, 
        linetype = "longdash"
      ), 
      axis.line = element_blank(), 
      axis.ticks = element_blank(), 
      axis.text.y = element_blank(), 
      axis.text.x = element_blank()
    )
}
my_custom_cor(iris, aes(Sepal.Length, Sepal.Width))

screen shot 2016-01-28 at 11 35 36 am

Next, we make a custom 'smooth' ggplot2 plot that uses data, mapping, and ... as parameters...

my_custom_smooth <- function(data, mapping, ...) {
  ggplot(data = data, mapping = mapping) +
    geom_point(color = I("blue")) + 
    geom_smooth(method = "lm", color = I("black"), ...)
}
my_custom_smooth(iris, aes(Sepal.Length, Sepal.Width))

screen shot 2016-01-28 at 11 36 09 am

Custom final example

ggpairs(
  iris[,1:4], 
  upper = list(continuous = my_custom_cor), 
  lower = list(continuous = my_custom_smooth), 
  axisLabels = "internal"
)

screen shot 2016-01-28 at 11 36 57 am

# slightly smaller labels
ggpairs(
  iris[,1:4], 
  upper = list(continuous = wrap(my_custom_cor, sizeRange = c(1,3))), 
  lower = list(continuous = my_custom_smooth), 
  axisLabels = "internal"
)

screen shot 2016-01-28 at 11 37 24 am

Hope this helps!

Feel free to adjust the code to your needs. Now that custom plots may be inserted, you have full control over what goes in the plot space.

Barret

yzh402 commented 8 years ago

Could the size of the axis text or labels on the diag panel also be changed? Any external codes or have to rewrite the function package?

schloerke commented 8 years ago

Normally, I would say you should wrap the function. But when axisLabels are "internal", (currently) ggpairs is stomping anything that is given. ... So I'll work on that.

In the mean time...

Continuing from the example above...

# final example from above
pm <- ggpairs(
  iris[,1:4], 
  upper = list(continuous = my_custom_cor), 
  lower = list(continuous = my_custom_smooth), 
  axisLabels = "internal"
)
pm

screenshot 2016-02-04 09 43 03

The pm object is a ggmatrix object.

str(pm)
# 
# Custom str.ggmatrix output: 
# To view original object use 'str(pm, raw = TRUE)'
# 
# List of 15
#  $ data               :'data.frame':  150 obs. of  4 variables:
#   ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#   ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#   ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#   ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ plots              :List of 16
#   ..$ : chr "PM; aes: c(x = Sepal.Length); fn: {wrap; fn: 'ggally_diagAxis'; with params: c(label = Sepal.Length)}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Sepal.Width, y = Sepal.Length); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Petal.Length, y = Sepal.Length); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Petal.Width, y = Sepal.Length); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Sepal.Length, y = Sepal.Width); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Sepal.Width); fn: {wrap; fn: 'ggally_diagAxis'; with params: c(label = Sepal.Width)}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Petal.Length, y = Sepal.Width); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Petal.Width, y = Sepal.Width); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Sepal.Length, y = Petal.Length); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Sepal.Width, y = Petal.Length); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Petal.Length); fn: {wrap; fn: 'ggally_diagAxis'; with params: c(label = Petal.Length)}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Petal.Width, y = Petal.Length); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Sepal.Length, y = Petal.Width); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Sepal.Width, y = Petal.Width); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Petal.Length, y = Petal.Width); fn: {wrap; fn: 'custom_function'; with params: c()}; gg: FALSE"
#   ..$ : chr "PM; aes: c(x = Petal.Width); fn: {wrap; fn: 'ggally_diagAxis'; with params: c(label = Petal.Width)}; gg: FALSE"
#  $ title              : chr ""
#  $ verbose            : logi FALSE
#  $ printInfo          : logi FALSE
#  $ showStrips         : NULL
#  $ xAxisLabels        : NULL
#  $ yAxisLabels        : NULL
#  $ showXAxisPlotLabels: logi FALSE
#  $ showYAxisPlotLabels: logi FALSE
#  $ legends            : logi FALSE
#  $ gg                 : NULL
#  $ nrow               : int 4
#  $ ncol               : int 4
#  $ byrow              : logi TRUE
#  - attr(*, "_class")= chr [1:2] "gg" "ggmatrix"

From the str output above, we can see that it's using the ggmatrix_diagAxis function. (Currently) ggpairs is stomping the diag functions if we set axisLabels to "internal", so let's set the axisLabels to "show" (default), and adjust after ggpairs makes the plot matrix.

pm <- ggpairs(
  iris[,1:4], 
  upper = list(continuous = my_custom_cor), 
  lower = list(continuous = my_custom_smooth), 
  diag = list(continuous = wrap('diagAxis', labelSize = 3)),
  axisLabels = "show"
)
pm

screenshot 2016-02-04 09 55 47

Set the display of plot labels to false and remove all text labels

pm$showXAxisPlotLabels <- FALSE
pm$showYAxisPlotLabels <- FALSE
pm$xAxisLabels <- NULL
pm$yAxisLabels <- NULL
pm

screenshot 2016-02-04 09 56 00

Hope this helps!

Barret

akhst7 commented 8 years ago

It is possible get rid of the diag panel ? I tried diag=Null but did not work. Also Is it possible to remove all the background from each panel ?

Thanks.

schloerke commented 8 years ago

Use:

diag = "blank"

I'll see why diag = NULL does not work.

For the background, I would use the theme_bw() command. There is more explanation here about themes in general: http://docs.ggplot2.org/0.9.2.1/theme.html

pm <- ggpairs(iris)
pm <- pm + theme_bw()
pm

screen shot 2016-03-01 at 12 05 17 pm

library(ggthemes)
pm <- ggpairs(iris) + theme_tufte()
pm

screen shot 2016-03-01 at 12 05 31 pm

akhst7 commented 8 years ago

Another question. I got the output that I need and am wondering if it is possible to make a border of the particular plot that have the correlation p<0.05 read as shown in the example figure.

rplot06

schloerke commented 8 years ago

Inside your a custom 'continuous' function, you'll need to calculate the p-value. If that p-value is in the range you approve of, you'll add a theme to the plot.

library(GGally)
library(ggplot2)

my_custom_smooth <- function(data, mapping, ...) {
  p <- ggplot(data = data, mapping = mapping) +
    geom_point(color = I("blue")) + 
    geom_smooth(method = "lm", color = I("black"), ...)

  lmModel <- eval(substitute(lm(y ~ x, data = data), mapping))
  fs <- summary(lmModel)$fstatistic
  pValue <- pf(fs[1], fs[2], fs[3], lower.tail = FALSE)

  if (pValue < 0.05) {
    p <- p + theme(
      panel.border = element_rect(
        color = "red", 
        size = 3,
        linetype = "solid",
        fill = "transparent"
      )
    )
  }

  p
}

ggpairs(iris[,1:4], lower = list(continuous = my_custom_smooth), upper = "blank", diag = "blank")

screenshot 2016-03-03 19 57 11

ScottStetkiewicz commented 8 years ago

This thread has been wonderfully useful, especially the last point - I've tweaked it in a very clumsy way to work for the upper custom_cor, as I want to color-code that like in corrplot (but still want the rest of what ggpairs offers). In particular, I had a whale of a time trying to clear the background grey from each fo the plots...

my_custom_cor <- function(data, mapping, color = I("black"), sizeRange = c(1, 5), ...) {

  # get the x and y data to use the other code
  x <- eval(mapping$x, data)
  y <- eval(mapping$y, data)

  ct <- cor.test(x,y)

  r <- unname(ct$estimate)
  rt <- format(r, digits=2)[1]
  tt <- as.character(rt)

  # plot the cor value
  p <- ggally_text(
   label = tt, 
   mapping = aes(),
   xP = 0.5, yP = 0.5, 
   size = 6,
   color=color,
   ...
  ) +

theme(panel.background=element_rect(fill="white"),
        panel.background = element_rect(color = "black", linetype = "dashed"),
        panel.grid.minor=element_blank(),
        panel.grid.major=element_blank()) 

if (tt >= -0.6 && tt <= -0.8) {p=p+theme(panel.background = element_rect(fill="#ADDD8E"))}
else if 
   (tt >= -0.8 && tt <= -1.0) {p=p+theme(panel.background = element_rect(fill = "#FFFFE5"))} 
else if
   (tt >= 0.6 && tt <= 0.8) {p=p+theme(panel.background=element_rect(fill="#41AB5D"))}
else if
   (tt >= 0.8 && tt <= 1.0) {p=p+theme(panel.background=element_rect(fill="#005A32"))}

 p
}

and

my_custom_smooth <- function(data, mapping, ...) {
  ggplot(data = data, mapping = mapping) +
    geom_point(aes(color = I("black"))) + 
    scale_color_brewer(palette="Accent") +
    geom_smooth(method = "lm", color = I("black"), ...) +
theme(panel.background=element_blank(),
        panel.grid.minor=element_blank(),
        panel.grid.major=element_line(color="gray"))
}

then

data(mtcars)
ggpairs(mtcars,columns=2:10,upper=list(continuous=wrap(my_custom_cor)),
diag=list(continuous="density"),lower=list(continuous=wrap(my_custom_smooth), combo =
wrap("dot", color=Class)), axisLabels = "none")

However, as you can see below, the negative correlation values aren't being recognized as discrete from the positive ones. This leads to -0.7 and 0.7 being the same color, while I have (or so I thought) lines to make the negative values yellower, and positive values greener. I'm assuming the problem lies in what I'm defining as "tt" for the ifelse function.

mtcars plot

I'm brand new to to R, so I'd imagine I've made some stupidly trivial mistake, but for the life of me I can't see what it is. Do you have any suggestions? Thanks again for making such a useful post!

schloerke commented 8 years ago

The if statements were checking against a character object, rather than a numeric object. I changed all tt to r

Also, you have diverging information (correlation) from a neutral point (0). I'd really recommend going towards two different colors. Yellow and green are on the same color area, so it makes it difficult to interpret which is 'up' or 'down'. RColorBrewer has already solved the color spectrum suggestions. I've incorporated it into the correlation function.

library(RColorBrewer)
RColorBrewer::display.brewer.all()

screen shot 2016-03-11 at 11 44 18 am Palette RdYlGn looks pretty good, as we associate red and greed with positive and negative naturally.

# take the inner 5 colors of 7 colors so that the extreme color is not so intense
corColors <- RColorBrewer::brewer.pal(n = 7, name = "RdYlGn")[2:6]
corColors
# [1] "#FC8D59" "#FEE08B" "#FFFFBF" "#D9EF8B" "#91CF60"

my_custom_cor_color <- function(data, mapping, color = I("black"), sizeRange = c(1, 5), ...) {

  # get the x and y data to use the other code
  x <- eval(mapping$x, data)
  y <- eval(mapping$y, data)

  ct <- cor.test(x,y)

  r <- unname(ct$estimate)
  rt <- format(r, digits=2)[1]
  tt <- as.character(rt)

  # plot the cor value
  p <- ggally_text(
   label = tt, 
   mapping = aes(),
   xP = 0.5, yP = 0.5, 
   size = 6,
   color=color,
   ...
  ) +

  theme(
    panel.background=element_rect(fill="white"),
    panel.background = element_rect(color = "black", linetype = "dashed"),
    panel.grid.minor=element_blank(),
    panel.grid.major=element_blank()
  ) 

  corColors <- RColorBrewer::brewer.pal(n = 7, name = "RdYlGn")[2:6]

  if (r <= -0.8) {
    corCol <- corColors[1]
  } else if (r <= -0.6) {
    corCol <- corColors[2]
  } else if (r < 0.6) {
    corCol <- corColors[3]
  } else if (r < 0.8) {
    corCol <- corColors[4]
  } else {
    corCol <- corColors[5]
  }
  p <- p + theme(
    panel.background = element_rect(fill= corCol)
  )

  p
}

# no need to wrap functions if there are no parameters
# lower$combo had a wrap to the object Class which I couldn't find, so I removed it.
ggpairs(
  mtcars,
  columns = 2:10,
  upper = list(continuous = my_custom_cor_color),
  diag = list(continuous = "density"),
  lower = list(
    continuous = my_custom_smooth, 
    combo = "dot"
  ), 
  axisLabels = "none"
)

screen shot 2016-03-11 at 11 15 08 am

schloerke commented 8 years ago

I'm going to lock this conversation.

If there are any more ggpairs (or any GGally) unexpected behavior or issues, please submit a new github issue.
If there are any plot implementation questions, I'd like to divert the conversation to the gitter group for GGally: https://gitter.im/ggobi/ggally

schloerke commented 6 years ago

For data evaluations to work with ggplot2 version >= 2.2.2, all calls similar to eval(mapping$x, data) should be switched to GGally::eval_data_col(data, mapping$x) to handle ggplot2's new aes() format.