easystats / correlation

:link: Methods for Correlation Analysis
https://easystats.github.io/correlation/
Other
431 stars 55 forks source link

select columns in `correlation()` #146

Closed mattansb closed 3 years ago

mattansb commented 3 years ago

Instead of this:

correlation::correlation(
  data = mtcars[c("cyl", "wt")],
  data2 = mtcars[c("hp")],
)
#> # Correlation table (pearson-method)
#> 
#> Parameter1 | Parameter2 |    r |       95% CI | t(30) |         p
#> -----------------------------------------------------------------
#> cyl        |         hp | 0.83 | [0.68, 0.92] |  8.23 | < .001***
#> wt         |         hp | 0.66 | [0.40, 0.82] |  4.80 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 32

can we have something like this?

correlation::correlation(
  data = mtcars,
  select = c("cyl", "wt"),
  select2 = "hp"
)

Which would work nice with the pipe.

mtcars %>% 
  correlation::correlation(
    select = c("cyl", "wt"),
    select2 = "hp"
  )
IndrajeetPatil commented 3 years ago

Implementing this is trivial for me via rlang, but I need to learn the non-tidyeval way. On it.

strengejacke commented 3 years ago

Since we use character vectors, there is no need for NSE.

strengejacke commented 3 years ago
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(correlation)

iris %>% 
  correlation(select = "Petal.Width", 
              select2 = c("Sepal.Length", "Sepal.Width"))
#> # Correlation Matrix (pearson-method)
#> 
#> Parameter1  |   Parameter2 |     r |         95% CI | t(148) |         p
#> ------------------------------------------------------------------------
#> Petal.Width | Sepal.Length |  0.82 | [ 0.76,  0.86] |  17.30 | < .001***
#> Petal.Width |  Sepal.Width | -0.37 | [-0.50, -0.22] |  -4.79 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 150

iris %>% 
  group_by(Species) %>% 
  correlation(select = "Petal.Width", 
              select2 = c("Sepal.Length", "Sepal.Width"))
#> # Correlation Matrix (pearson-method)
#> 
#> Group      |  Parameter1 |   Parameter2 |    r |        95% CI | t(48) |         p
#> ----------------------------------------------------------------------------------
#> setosa     | Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] |  2.01 | 0.101    
#> setosa     | Petal.Width |  Sepal.Width | 0.23 | [-0.05, 0.48] |  1.66 | 0.104    
#> versicolor | Petal.Width | Sepal.Length | 0.55 | [ 0.32, 0.72] |  4.52 | < .001***
#> versicolor | Petal.Width |  Sepal.Width | 0.66 | [ 0.47, 0.80] |  6.15 | < .001***
#> virginica  | Petal.Width | Sepal.Length | 0.28 | [ 0.00, 0.52] |  2.03 | < .05*   
#> virginica  | Petal.Width |  Sepal.Width | 0.54 | [ 0.31, 0.71] |  4.42 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 50

Created on 2021-03-25 by the reprex package (v1.0.0)

IndrajeetPatil commented 3 years ago

I was working on this and was just about to make a PR 😢

IndrajeetPatil commented 3 years ago

Lesson learned: will self-assign issues I would like to cover.

IndrajeetPatil commented 3 years ago

Just looked at your code and the way you handle grouped dataframes is much better than what I was doing, so the better solution has won 😅

mattansb commented 3 years ago

I made a change to allow only select without select2 to preserve the behavior of data/data2:

library(magrittr)
library(correlation)
#> Registered S3 method overwritten by 'parameters':
#>   method     from      
#>   ci.blavaan bayestestR

These return the same:

(rr1 <- mtcars %>% 
    correlation::correlation(
      select = c("cyl", "wt"),
      select2 = "hp"
    ))
#> # Correlation Matrix (pearson-method)
#> 
#> Parameter1 | Parameter2 |    r |       95% CI | t(30) |         p
#> -----------------------------------------------------------------
#> cyl        |         hp | 0.83 | [0.68, 0.92] |  8.23 | < .001***
#> wt         |         hp | 0.66 | [0.40, 0.82] |  4.80 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 32

(rr2 <- correlation::correlation(
  data = mtcars[c("cyl", "wt")],
  data2 = mtcars["hp"]
))
#> # Correlation Matrix (pearson-method)
#> 
#> Parameter1 | Parameter2 |    r |       95% CI | t(30) |         p
#> -----------------------------------------------------------------
#> cyl        |         hp | 0.83 | [0.68, 0.92] |  8.23 | < .001***
#> wt         |         hp | 0.66 | [0.40, 0.82] |  4.80 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 32

all.equal(rr1, rr2)
#> [1] TRUE

(rr1 <- mtcars %>% 
    dplyr::group_by(am) %>% 
    correlation::correlation(
      select = c("cyl", "wt"),
      select2 = "hp"
    ))
#> # Correlation Matrix (pearson-method)
#> 
#> Group | Parameter1 | Parameter2 |    r |       95% CI |    t | df |         p
#> -----------------------------------------------------------------------------
#> 0     |        cyl |         hp | 0.85 | [0.64, 0.94] | 6.53 | 17 | < .001***
#> 0     |         wt |         hp | 0.68 | [0.33, 0.87] | 3.82 | 17 | < .01**  
#> 1     |        cyl |         hp | 0.90 | [0.69, 0.97] | 6.87 | 11 | < .001***
#> 1     |         wt |         hp | 0.81 | [0.48, 0.94] | 4.66 | 11 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 13-19

(rr2 <- correlation::correlation(
  data = dplyr::group_by(mtcars, am) %>% dplyr::select(cyl, wt),
  data2 = dplyr::group_by(mtcars, am) %>% dplyr::select(hp)
))
#> Adding missing grouping variables: `am`
#> Adding missing grouping variables: `am`
#> # Correlation Matrix (pearson-method)
#> 
#> Group | Parameter1 | Parameter2 |    r |       95% CI |    t | df |         p
#> -----------------------------------------------------------------------------
#> 0     |        cyl |         hp | 0.85 | [0.64, 0.94] | 6.53 | 17 | < .001***
#> 0     |         wt |         hp | 0.68 | [0.33, 0.87] | 3.82 | 17 | < .01**  
#> 1     |        cyl |         hp | 0.90 | [0.69, 0.97] | 6.87 | 11 | < .001***
#> 1     |         wt |         hp | 0.81 | [0.48, 0.94] | 4.66 | 11 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 13-19

Now these also return the same:

(rr1 <- mtcars %>% 
    correlation::correlation(
      select = c("cyl", "wt")
    ))
#> # Correlation Matrix (pearson-method)
#> 
#> Parameter1 | Parameter2 |    r |       95% CI | t(30) |         p
#> -----------------------------------------------------------------
#> cyl        |         wt | 0.78 | [0.60, 0.89] |  6.88 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 32

(rr2 <- correlation::correlation(
  data = mtcars[c("cyl", "wt")]
))
#> # Correlation Matrix (pearson-method)
#> 
#> Parameter1 | Parameter2 |    r |       95% CI | t(30) |         p
#> -----------------------------------------------------------------
#> cyl        |         wt | 0.78 | [0.60, 0.89] |  6.88 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 32

all.equal(rr1, rr2)
#> [1] TRUE

(rr1 <- mtcars %>% 
    dplyr::group_by(am) %>% 
    correlation::correlation(
      select = c("cyl", "wt")
    ))
#> # Correlation Matrix (pearson-method)
#> 
#> Group | Parameter1 | Parameter2 |    r |       95% CI |    t | df |         p
#> -----------------------------------------------------------------------------
#> 0     |        cyl |         wt | 0.60 | [0.21, 0.83] | 3.12 | 17 | < .01**  
#> 1     |        cyl |         wt | 0.85 | [0.56, 0.95] | 5.28 | 11 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 13-19

(rr2 <- correlation::correlation(
  data = dplyr::group_by(mtcars, am) %>% dplyr::select(cyl, wt)
))
#> Adding missing grouping variables: `am`
#> # Correlation Matrix (pearson-method)
#> 
#> Group | Parameter1 | Parameter2 |    r |       95% CI |    t | df |         p
#> -----------------------------------------------------------------------------
#> 0     |        cyl |         wt | 0.60 | [0.21, 0.83] | 3.12 | 17 | < .01**  
#> 1     |        cyl |         wt | 0.85 | [0.56, 0.95] | 5.28 | 11 | < .001***
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 13-19

Created on 2021-03-25 by the reprex package (v1.0.0)