InseadDataAnalytics / INSEADAnalytics

Other
122 stars 1.31k forks source link

Not clear how to pass a formula as an argument in a lm() or glm() function #119

Closed pgmorgan closed 6 years ago

pgmorgan commented 6 years ago

In Lecture 5-6, in file: 0506%20STC%20(A)%20Logistic.R In line 105, we write:

model_logistic <- glm(Retained.in.2012.~ Special.Pay + To.Grade + ...

However, by line 105 we had already created three objects that had columns "Retained.in.2012", "Special.Pay", "To.Grade", etc. Namely, we had created these three objects: STCdata_A, testing, & training

When we execute line 105 above, how does the glm() function know whether to take any dependent variable (eg: "To.Grade") from STCdata_A, testing, or training? Obviously we want to take it from the training object.

Wouldn't it have been more appropriate to type the following?

model_logistic <- glm(training$Retained.in.2012.~ training$(Special.Pay + To.Grade + ...

VivianZhang0721 commented 6 years ago

We specified data using "data=" so we don't need to specify every time when we are using the variable model_logistic<-glm(Retained.in.2012.~ Special.Pay + To.Grade + Group. , data=training,

pgmorgan commented 6 years ago

@VivianZhang0721 I don't follow you. What line of code are you referring to?

VivianZhang0721 commented 6 years ago

As I understand that your question is how lm and glm code know which data set. In the line of code you mentioned, it specifies data=training after the Retained.in.2012.~ Special.Pay + To.Grade + Group........ Read till the end of line 114. from 105 to 114 its one formula written in multiple lines

pgmorgan commented 6 years ago

Yes, I see, it's the 2nd argument in the glm() function. Thank you