amrei-stammann / alpaca

An R-package for fitting glm's with high-dimensional k-way fixed effects
43 stars 6 forks source link

"expanding" functions within an feglm formula #9

Closed tcovert closed 4 years ago

tcovert commented 5 years ago

The data.table internals of alpaca don't seem to behave nicely with formula elements that generate more than one resulting column. For example, if you use bs() from the splines package or poly() from the base stats module, both provide clean notation for a flexible polynomial function of one (or more, in the case of poly) variables. However, feglm seems to rely on data.table to pick these column expansions up correctly, and that does not yet work. For example, in a dataset where I have an Acres variable that I'd like to get a 2nd order orthogonal polynomial expansion in, this fails:

> m <- feglm(DBOEPerAcre ~ Auction + poly(Acres, degree = 2) + Term + RoyaltyRate | Grid20Yr + YearQtr | Grid20, reg_data, family = poisson())
Error in `[.data.table`(data, , `:=`((tmp.var), mean(get(lhs))), by = eval(i)) : 
  Column 3 ['poly(Acres, degree = 2)'] is length 2218 but column 1 is length 1109; malformed data.table.

As you can see, the dataset has 1109 rows, and poly(Acres, degree = 2) generates two columns, each of which has 1109 rows, but somehow feglm or data.table interpret that as a single column with 2218 rows.

Is there an easy fix for this, aside from pre-computing these columns in my main dataset?

Thanks.

amrei-stammann commented 5 years ago

Hi, currently there is no easy fix this problem. I will keep this issue in mind for the next update.