utterances-bot commented 3 years ago

Correlation coefficient and correlation test in R - Stats and R

Learn how to compute a correlation coefficient (Pearson and Spearman) and perform a correlation test in R

https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/

AntoineSoetewey commented 3 years ago

Comment written by YZurbuchen on October 21, 2020 08:03:49:

Hi Antoine,

Thanks a lot for your article. I would just have a short question regarding dichotomous variables. If I have a matrix consisting of continuous and dichotomous variables would it still be correct to use the cor.test function in order to subsequently visualise it as a correlation plot? I was unsure about how to handle it, according to this https://stats.stackexchange.com/questions/226157/p-value-for-point-biserial-correlation-in-r article it should automatically compute the point-biserial correlation and would therefore be correct?

Thanks a lot for your help!

AntoineSoetewey commented 3 years ago

Comment written by YZurbuchen on October 21, 2020 08:03:49:

Hi Antoine,

Thanks a lot for your article. I would just have a short question regarding dichotomous variables. If I have a matrix consisting of continuous and dichotomous variables would it still be correct to use the cor.test function in order to subsequently visualise it as a correlation plot? I was unsure about how to handle it, according to this https://stats.stackexchange.com/questions/226157/p-value-for-point-biserial-correlation-in-r article it should automatically compute the point-biserial correlation and would therefore be correct?

Thanks a lot for your help!

Comment written by Antoine Soetewey on October 21, 2020 08:48:24:

Dear YZurbuchen,

Thanks for your question.

I confirm that you can use cor.test() with one continuous and one dichotomous variable, it will produce the exact same results than computing the point-biserial correlation, as shown by the following example:

set.seed(42)
x <- rnorm(100)
y <- sample(c(0, 1), 100, replace = TRUE)
cor.test(x, y)
# Pearson's product-moment correlation
#
# data: x and y
# t = -0.32697, df = 98, p-value = 0.7444
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# -0.2279507 0.1644740
# sample estimates:
# cor
# -0.03301059
# install.packages("ltm")
library(ltm)
biserial.cor(x, y, level = 2)
# [1] -0.03301059

In the StackExchange thread you mentioned, the user Sal Mangiafico suggested (on April 9, 2018) to use the function biserial.cor.new() which uses the population standard deviation. However, as from April 17, 2018, the developer of the {ltm} package updated his biserial.cor() function to use the population standard deviation instead of the sample standard deviation (see this issue).

So now the two functions biserial.cor() and cor.test() functions produce the same results.

Hope this helps.

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by YZurbuchen on October 21, 2020 08:03:49: Hi Antoine, Thanks a lot for your article. I would just have a short question regarding dichotomous variables. If I have a matrix consisting of continuous and dichotomous variables would it still be correct to use the cor.test function in order to subsequently visualise it as a correlation plot? I was unsure about how to handle it, according to this https://stats.stackexchange.com/questions/226157/p-value-for-point-biserial-correlation-in-r article it should automatically compute the point-biserial correlation and would therefore be correct? Thanks a lot for your help!

Comment written by Antoine Soetewey on October 21, 2020 08:48:24:

Dear YZurbuchen,

Thanks for your question.

I confirm that you can use cor.test() with one continuous and one dichotomous variable, it will produce the exact same results than computing the point-biserial correlation, as shown by the following example:

set.seed(42) x <- rnorm(100) y <- sample(c(0, 1), 100, replace = TRUE) cor.test(x, y) # Pearson's product-moment correlation # # data: x and y # t = -0.32697, df = 98, p-value = 0.7444 # alternative hypothesis: true correlation is not equal to 0 # 95 percent confidence interval: # -0.2279507 0.1644740 # sample estimates: # cor # -0.03301059 # install.packages("ltm") library(ltm) biserial.cor(x, y, level = 2) # [1] -0.03301059

In the StackExchange thread you mentioned, the user Sal Mangiafico suggested (on April 9, 2018) to use the function biserial.cor.new() which uses the population standard deviation. However, as from April 17, 2018, the developer of the {ltm} package updated his biserial.cor() function to use the population standard deviation instead of the sample standard deviation (see this issue).

So now the two functions biserial.cor() and cor.test() functions produce the same results.

Hope this helps.

Regards, Antoine

Comment written by YZurbuchen on October 21, 2020 14:43:17:

Thank you very much for your super quick answer, that's great!

Best wishes,
Yves

AntoineSoetewey commented 3 years ago

Comment written by YZurbuchen on October 21, 2020 08:03:49: Hi Antoine, Thanks a lot for your article. I would just have a short question regarding dichotomous variables. If I have a matrix consisting of continuous and dichotomous variables would it still be correct to use the cor.test function in order to subsequently visualise it as a correlation plot? I was unsure about how to handle it, according to this https://stats.stackexchange.com/questions/226157/p-value-for-point-biserial-correlation-in-r article it should automatically compute the point-biserial correlation and would therefore be correct? Thanks a lot for your help!

Comment written by Antoine Soetewey on October 21, 2020 08:48:24: Dear YZurbuchen, Thanks for your question. I confirm that you can use cor.test() with one continuous and one dichotomous variable, it will produce the exact same results than computing the point-biserial correlation, as shown by the following example: set.seed(42) x <- rnorm(100) y <- sample(c(0, 1), 100, replace = TRUE) cor.test(x, y) # Pearson's product-moment correlation # # data: x and y # t = -0.32697, df = 98, p-value = 0.7444 # alternative hypothesis: true correlation is not equal to 0 # 95 percent confidence interval: # -0.2279507 0.1644740 # sample estimates: # cor # -0.03301059 # install.packages("ltm") library(ltm) biserial.cor(x, y, level = 2) # [1] -0.03301059 In the StackExchange thread you mentioned, the user Sal Mangiafico suggested (on April 9, 2018) to use the function biserial.cor.new() which uses the population standard deviation. However, as from April 17, 2018, the developer of the {ltm} package updated his biserial.cor() function to use the population standard deviation instead of the sample standard deviation (see this issue). So now the two functions biserial.cor() and cor.test() functions produce the same results. Hope this helps. Regards, Antoine

Comment written by YZurbuchen on October 21, 2020 14:43:17:

Thank you very much for your super quick answer, that's great!

Best wishes, Yves

Comment written by Antoine Soetewey on October 21, 2020 14:44:40:

You're welcome!

AntoineSoetewey commented 3 years ago

Comment written by madihah manggabarani on October 22, 2020 23:24:51:

Dear Antoine, I really appreciate your article, it is easily to be understood and to follow the practice using R !

AntoineSoetewey commented 3 years ago

Comment written by madihah manggabarani on October 22, 2020 23:24:51:

Dear Antoine, I really appreciate your article, it is easily to be understood and to follow the practice using R !

Comment written by Antoine Soetewey on October 23, 2020 05:11:57:

Glad you find it useful Madihah!

AntoineSoetewey commented 3 years ago

Comment written by vijayarajamanickam on November 30, 2020 22:55:44:

Dear Antoine, even after "devtools::install_github("laresbernardo/lares")" package installation, I am not able to access corr_cross function.

Could you please help me from that?

AntoineSoetewey commented 3 years ago

Comment written by vijayarajamanickam on November 30, 2020 22:55:44:

Dear Antoine, even after "devtools::install_github("laresbernardo/lares")" package installation, I am not able to access corr_cross function.

Could you please help me from that?

Comment written by Antoine Soetewey on December 01, 2020 12:50:39:

Hello,

I just tried on my side with the following code: devtools::install_github("laresbernardo/lares")
library(lares)
corr_cross(mtcars)

and it works smoothly.

So few things to make it work on your side:

are you sure you executed devtools::install_github("laresbernardo/lares") and not devtools::installgithub("laresbernardo/lares")?
when R asks you if you want to update packages when installing the {lares} package, choose the option 3: None (so don't update packages)
make sure to load the package by running library(lares) after having installed it
try again with a basic dataset

Let me know if this does not fix your issue.

Regards,
Antoine

AntoineSoetewey commented 3 years ago

Comment written by vijayarajamanickam on November 30, 2020 22:55:44: Dear Antoine, even after "devtools::install_github("laresbernardo/lares")" package installation, I am not able to access corr_cross function. Could you please help me from that?

Comment written by Antoine Soetewey on December 01, 2020 12:50:39:

Hello,

I just tried on my side with the following code: devtools::install_github("laresbernardo/lares") library(lares) corr_cross(mtcars)

and it works smoothly.

So few things to make it work on your side:

are you sure you executed devtools::install_github("laresbernardo/lares") and not devtools::installgithub("laresbernardo/lares")?

when R asks you if you want to update packages when installing the {lares} package, choose the option 3: None (so don't update packages)

make sure to load the package by running library(lares) after having installed it

try again with a basic dataset

Let me know if this does not fix your issue.

Regards, Antoine

Comment written by vijayarajamanickam on December 02, 2020 15:26:04:

Dear Antoine,

Now it works fine.
Thanks a lot for your response.

kind regards
vijay

AntoineSoetewey commented 3 years ago

Comment written by vijayarajamanickam on November 30, 2020 22:55:44: Dear Antoine, even after "devtools::install_github("laresbernardo/lares")" package installation, I am not able to access corr_cross function. Could you please help me from that?

Comment written by Antoine Soetewey on December 01, 2020 12:50:39: Hello, I just tried on my side with the following code: devtools::install_github("laresbernardo/lares") library(lares) corr_cross(mtcars) and it works smoothly. So few things to make it work on your side:

are you sure you executed devtools::install_github("laresbernardo/lares") and not devtools::installgithub("laresbernardo/lares")?

when R asks you if you want to update packages when installing the {lares} package, choose the option 3: None (so don't update packages)

make sure to load the package by running library(lares) after having installed it

try again with a basic dataset

Let me know if this does not fix your issue. Regards, Antoine

Comment written by vijayarajamanickam on December 02, 2020 15:26:04:

Dear Antoine,

Now it works fine. Thanks a lot for your response.

kind regards vijay

Comment written by Antoine Soetewey on December 02, 2020 15:59:28:

You're welcome, glad it worked!

parksonpurity commented 3 years ago

Hi Antoine;

Just a simple question that if I have a continuous and three dichotomous variables, and in that case, do you know how to calculate/test their correlation?

Cheers - Tao

AntoineSoetewey commented 3 years ago

Hi Antoine;

Just a simple question that if I have a continuous and three dichotomous variables, and in that case, do you know how to calculate/test their correlation?

Cheers - Tao

Dear Tao,

Assuming your three dichotomous variables are qualitative nominal variables, you cannot compute a correlation (even if your second variable is a quantitative continuous variable).

Remember that a correlation can only be computed between quantitative or qualitative ordinal variables.

What you can do:

consider your continuous variable as dependent variable and any of your dichotomous variable as independent variable (but only one at a time). You therefore have two groups, and you can compare them via a Student's t-test or a Wilcoxon test (depending on the number of observations or on the normality of your data)
if you need to use two dichotomous variables, you can test their independence thanks to a Chi-square test of independence (see how to do this test in R or by hand)
last but not least, consider your continuous variable as dependent variable and your dichotomous variables as independent variables and perform a linear regression. If you consider only one dichotomous variable it will be a simple linear regression, whereas if you consider two or more independent variables it will be a multiple linear regression

Hope that helps.

Regards, Antoine

rafabelokurows commented 3 years ago

Thanks for sharing all that, I learned a lot from this post. I'll use some methods from package "correlation" in my master's thesis :)

AntoineSoetewey commented 3 years ago

Thanks for sharing all that, I learned a lot from this post. I'll use some methods from package "correlation" in my master's thesis :)

Thanks for your feedback Rafael, and good luck with your thesis!

AntoineSoetewey / statsandr

blog/correlation-coefficient-and-correlation-test-in-r/ #45

Correlation coefficient and correlation test in R - Stats and R