Understanding when to use factor

kapoormanisha commented 1 year ago

Hi everyone,

I have a pooled cross sectional data. Basically individual level observations for 20 regions and 13 time periods. I am trying to run a non-parametric regression of y~x1+controls. Y is a dummy variable (changes with individual, year and region); x1 is a continuous integer variable with values ranging between 0-12 (changes only with year and region) and most of the controls are dummy or categorical.

My first question is while running the regression should I define my y as factor or not. Basically, which of these codes should be used: 1: bw.1 <- npregbw( factor(y) ~ x1+ ordered(year) + ordered(region) + factor(male) + lu.age, regtype = "ll", data = ludf) 2: bw.1 <- npregbw( y ~ x1+ ordered(year) + ordered(region) + factor(male) + lu.age, regtype = "ll", data = ludf)

Secondly: Since it is pooled data, is it the correct way to define the year and region as controls in the regression?

JeffreyRacine commented 1 year ago

Greetings,

Q1 - Use 2, 1 is ignored by npreg/npregbw as it requires numeric vectors for Y in order to compute a cross-valuation function (essentially 1 gets converted to numeric but can be arbitrary depending on the coding of your Y)

Q2 - looks ideal!

kapoormanisha commented 1 year ago

Thank you so much Professor Racine for getting back to me.

Regards, Manisha

JeffreyRacine / R-Package-np

Understanding when to use factor #41