CrumpLab / psyc3400

Website for Psyc 3400 Statistics @ Brooklyn College (Taught by Matt Crump)
16 stars 15 forks source link

Week 3 Quiz #11

Closed AlbertAini closed 6 years ago

AlbertAini commented 6 years ago

I might be incorrect but I believe that one of the questions in the quiz says to find the covariance of a data set using N but the answers calls for N-1.

Compute the correlation between the x and y scores. Report pearson's r to two decimal places (e.g., .14, or .10). Use the formula from chapter 3, make sure you divide by N for all calculations, not N-1. Round your answer up if need be (e.g., .139 should be rounded to .14)

x | y 5 | 4 2 | 1 8 | 5 1 | 2 7 | 9 Selected Answer: 0.8 Correct Answer: 0.8

Answer range +/-0.01 (0.79 - 0.81)Response

Selected Answer: | 0.8 Correct Answer: | 0.8 Answer range +/- | 0.01 (0.79 - 0.81) Response Feedback: | The correlation between the x and y scores is 0.8

CrumpLab commented 6 years ago

Hi Albert,

I can see why there might be some confusion about this. Up to this point in class we have been dividing by N when we computing the covariance and the standard deviation.

The covariance and standard deviation can also be computed with N-1 in the denominator, we will discuss the choice of N vs N-1 next week.

In the question that you posted, the problem was to compute the correlation. The question states that you should divide by N (when finding the covariation and the standard deviations for X and Y). The correct answer for these numbers is r=.80. It turns out that if you divide by N-1 for the covariation and for the standard deviations, you will get the same answer r=.8.

Here are a few different ways to compute the correlation for these two sets of numbers, notice they all give the same answer, regardless of whether you divide by N or N-1

x<-c(5,2,8,1,7)
y<-c(4,1,5,2,9)

# using R's cor function
cor(x,y)

# using R's cov and sd functions (both divide by N-1)
cov(x,y)/(sd(x)*sd(y))

# Long form dividing by N for covariance and sd
covariation <- sum((x-mean(x))*(y-mean(y)))/5
SD_x <- sqrt(sum((x-mean(x))^2)/5)
SD_y <- sqrt(sum((y-mean(y))^2)/5)
covariation/(SD_x*SD_y)

# Long form dividing by N-1 for covariance and sd
covariation <- sum((x-mean(x))*(y-mean(y)))/4
SD_x <- sqrt(sum((x-mean(x))^2)/4)
SD_y <- sqrt(sum((y-mean(y))^2)/4)
covariation/(SD_x*SD_y)