Watts-College / cpp-523-fall-2021

https://watts-college.github.io/cpp-523-fall-2021/
1 stars 3 forks source link

Resource for reading multivariate regression model comparison chart #9

Open dholford opened 2 years ago

dholford commented 2 years ago

Hello,

I'm wondering if anyone has found or knows of a resource for reading these:

image

I went back and rewatched the explanation during the Week 2 review session, but I'm still having a hard time piecing together how the actual models are associated with each variable. This week especially, the model that seems to be associated with teacher quality and class size doesn't seem right based on the r output, so I think I'm reading it wrong.

Schlinkert commented 2 years ago

Good question. Think of the correlation matrix as a way to understand the correlation between variables. You won’t necessarily see these values in the regression table output, but what you will be able to do with the values in the correlation matrix is understand what will happen to your model if you add in or leave a control variable out of your model. This helps you describe what your regression model is and is not telling you.

Here is an example of how to read this matrix. Let's say I want to know what the relationship is between test score and class size. I would compare the two variables and their corresponding graph and coefficient. So, for the correlation between class size and test score, I see that there is a negative correlation of -0.60 (and I can also see this represented graphically). This same idea can be used to see the relationships between all of the variables. So, now what? The power behind knowing how these variables are correlated is that you can talk about what would or would not happen if you were to add in, or leave these variables out of your regression model. To interpret what would happen in your regression model based on the variables that are or are not in your model, please refer to this week's review guide that talks about the correlation between control variables and policy and dependent variables.

Please let me know if you have any more questions on this.

BrettMFoster commented 2 years ago

I believe this a correlation matrix built using ggplot in R. You may can google that information and search by images for even more examples.

dholford commented 2 years ago

I think the part that's throwing me the most is conceptualizing how the graphs correspond to the variables. I've added text to the matrix to sort of label the graphs and how I think they correspond to the variables is this right:

R matrix
Schlinkert commented 2 years ago

This matrix is comparing two variables at a time. So, for the most part you are right. For the three squares where you have more than two variables listed, you will only want to compare two at a time. For example, right next to csize and tqual, you also have test labeled under the chart. This chart is only showing the relationship between csize and tqual, and the correlation is -0.057. This matrix is not your full regression model, it's just telling you how each variable is correlated, so that you will know what will happen to you model if you decide to add or leave out a variable.

Here is further reading on interpreting a correlation matrix: https://www.statology.org/how-to-read-a-correlation-matrix/

ebossert commented 2 years ago

Thank you so much! I was really struggling with interpreting these and this link is tremendously helpful!

dholford commented 2 years ago

Ah, I get it! Knowing each square is two variables unlocked it for me. I can see how they link up now and will definitely check out the additional explanation in the link!

Thanks, Dylan