Open lilyclements opened 3 years ago
Another feature of the correlate
function in this package that I have come across is how it deals with missing values.
The usual cor
function puts an NA
when considering correlations if there is a missing value in the variable. However, the correlate
function calculates the value in the "complete case" scenario:
library(tidyverse)
library(corrr)
data(mtcars)
mtcars[5,5] <- NA # set a value as missing in the "drat" variable
cor(mtcars$drat, mtcars$wt) # correlation is NA
cor(mtcars) # all correlations for the "drat" variable is NA
correlate(mtcars$drat, mtcars$wt) # correlation is -0.715
correlate(mtcars) # correlation is given for all variables despite NAs
mtcars.complete <- mtcars %>% filter(complete.cases(mtcars)) # find the complete case data
cor(mtcars.complete$drat, mtcars.complete$wt) # gives the same values as the correlate function
I came across the new
corrr
package which deals with handling correlations in R. I thought I should summarise main aspects of the package in case any of the features could fit into R-Instat.Main Correlation Function The function to perform correlations is
correlate
. This function seemingly runs the same as thecor
function we currently use however there are a few minor differences:NA
, whereas thecor
function gives1
)method
anduse
options chosen. This can be removed by thequiet
parameter.use
is differentPlotting Functions There are two new functions with respect to plotting.
rplot
plots a correlation data frame usingggplot2
There are options to amend the plot (order variables alphabetically, add the correlation values, etc), however, the standard plot givesrplot(correlate(mtcars))
The other plot is
network_plot
. This is not plotting correctly on this laptop, however, I will try on another laptop. According to here it should look like this (they have plotted for only five variables: mpg-drat).Other Functions There are a few functions to help "clear up" the correlation matrix which probably are not so relevant here. But I'll summarise a few of them:
In
focus
, you can state the variables you want to view as columns. E.g.focus(correlate(mtcars), cyl, vs)
displays the correlations for all variables against thecyl
andvs
variables.focus(correlate(mtcars), cyl, vs)
In
dice
you can state the variables you want to give correlations for. However, in R-Instat we have a solution to this which I assume is much more efficient since this calculates the correlations for all variables, then selects the columns given.dice(correlate(mtcars), cyl, vs)
retract
/stretch
act likepivot_longer
/pivot_wider
stretch(correlate(mtcars))
shave
sets the repeated triangle to missing so that the correlation values aren't repeatedfashion
alters the output a bit further - you can specify how many decimal places, whether you want the0
digit to be displayed (e.g. 0.79 vs .79), etcAs a final sidenote, I noticed on dlgCorrelations that the "Options" button is at the end of the ucrSave. Since there is now the new "Position" button on a ucrSave, this "Options" could be confused with options for the saving, rather than the dialog. I suggest it is made a bit smaller, and shifted left so that it aligns with the end of the comment box.