Lab 04- Q2a - Githubissues

DS4PS / cpp-525-sum-2021

Course shell for CPP 525 Advanced Regression Analysis

0 stars 2 forks source link

n1.model <- lm(HealthStatus ~ PublicHousing + Race_1 + Race_2 + Race_3 + Race_4 + Education + Age + MaritalStatus_1 + MaritalStatus_2 + MaritalStatus_3 + MaritalStatus_4 , data = data) stargazer( n1.model, type = "text", dep.var.labels = ("Health Status"), column.labels = "", covariate.labels = c("Public Housing","White","Black", "Hispanic", "Other" , "Education" , "Age", "Single", "Married", "Widow", "Divorced" ), digits = 2 )

The problem is that you can never include all levels of a factor in a regression model. One will always be omitted.

m <- lm( y ~ x + male_dummy )   # ok
m <- lm( y ~ x + female_dummy )   # ok
m <- lm( y ~ x + male_dummy + female_dummy )   # not ok

Since you did not select which one to omit, R is doing it for you.

It is better for you to select the omitted categories yourself. Remember that whichever group you omit will become the reference group represented by the intercept b0.

It's best to put the "typical" or reference group in the intercept as a point of comparison for the other variables.

Also recall that all of the coefficient tests reported in the table will be in relation to the omitted category. So if you want to compare married and divorced people in the model omit one of those groups. If you omit widowed then everything is compared to widowed and you won't be able to see if married and divorced groups differ from each other.

DS4PS / cpp-525-sum-2021

Lab 04- Q2a #7