IALSA / ialsa-2016-amsterdam

Multi-study and multivariate evaluation of healthy life expectancy (HLE): An IALSA workshop on multistate modeling using R
GNU General Public License v2.0
0 stars 0 forks source link

The logic of dummy variables #38

Open andkov opened 7 years ago

andkov commented 7 years ago

by @ivacukic

I'm about to add new results to the table, when using two dummy variables for education, however I got confused and could use some clarification!

The table states that two dummy variables should compare low vs medium; and low vs high education.

If I wrote it down correctly, during the workshop we agreed that codes should be as follows:

reference variable dummy1 dummy2 -1 (low) 0 0
0 (med) 1 0 1 (high) 0 1

I am not very used to using dummy variables, but my reading of this is that dummy1 as described here would compare both low and high (now zeros) to medium (now 1), rather than only low vs medium. Similarly, dummy 2 would compare both low and medium (now zeros) to high (now 1). Which is different to what the tables say (low vs med, and low vs high). I got a bit confused so would appreciate any help here!

andkov commented 7 years ago

by @emielhoogendijk No this is correct. If you put both dummies in your model, low education is the only reference group (the 3rd dummy, not included in the model, is always the reference - I am not able to give you a mathematical explanation :-)

andkov commented 7 years ago

by @annierobi

Low education category is known as the reference group because it has all zeros. The 2 dummy variables called educH and educM contain all of the information needed which is why we only need 2, not 3 dummy coded variables . If educH contains a 1, it is compared to the reference group (educL) vs if educH contains a 1 it is compared to the reference group (educL).