DS4PS / cpp-523-fall-2020

http://ds4ps.org/cpp-523-fall-2020/
0 stars 3 forks source link

Correlation Matrix #5

Open karinalungo opened 4 years ago

karinalungo commented 4 years ago

I realize I am not yet understanding the Correlation Matrix panel from R, image below and the reason for the curved graphs. For the first quadrant, heart.rate and caffeine, the 0.39 is their r which is considered highly correlated. Why is it that the plot has a curved line instead of a straight slope though? I don't understand the curved lines. For the four squares in the middle, caffeine and stress.index, if I am not mistaken the 0.99 is the positive / almost perfect correlation between caffeine and stress.index and we see a straight line with small standard errors. If I understand correctly, similarly the r for heart.rate and stress.index is 0.40. But yet there is a curved line... why?

image

Schlinkert commented 4 years ago

Your statement, "For the four squares in the middle, caffeine and stress.index, if I am not mistaken the 0.99 is the positive / almost perfect correlation between caffeine and stress.index and we see a straight line with small standard errors." is correct. The numbers in the squares represent the correlation between two variables on a finite scale of -1 to 1. To answer your other two questions about the curvy lines, there can be different levels of correlation in different parts of your sample (lecture 1 shows how we can look at co-variance in different segments of a sample). We will go over this more in week five when we discuss non-linear regression and quadratic functions, which can help us better explain variance and correlation along the entire continuum of our data when the data is non-linear.

lecy commented 4 years ago

At the end of Lab 02 there is some additional guidance on reading the correlation plots as well:

https://ds4ps.org/cpp-523-fall-2020/labs/lab-02-class-size-confidence-intervals.html

image

We can use information from a correlation table to create a Ballentine Venn diagram, which will be a useful tool for developing intuition about how adding a specific control variable will impact a model. Pairs plots and Ballentine Venn diagrams are both ways of representing correlations and shared variance.

image

karinalungo commented 4 years ago

Thank you. So I guess my question is just to confirm that this correlation diagram represents only bivariate relationships, ignoring the presence of other variables - so if I look at the correlation between test and ses being 0.60, that disregards the existence of csize and tqual per the diagram above. Is this correct? And for the explanation of the curved lines I will just wait for our study of non linear regression :)

lecy commented 4 years ago

Correct!