Closed nilshg closed 3 years ago
Sorry, there was an issue merging. I'd be happy with adding something like this but you'd need to make sure you handle missings (missings in fes v.s. missing in other variables) + add tests
Also, I think it'd be better to do something like
df isa AbstractDataFrame || throw("...")
sum(Matrix(leftjoin(select(df, x.fekeys), unique(x.fe), on = x.fekeys, makeunique = true)), dims = 2)
(as well as avoiding creating a vector if there are no fixed effects)
Happy to try and add some tests in the next days. Not sure I understand your point about missings - would you expect different behaviour for missing FEs vs other covariates? Naively I would have thought that the prediction is ŷ = f̂e + β̂₁x₁ + β̂₂x₂ + ...
which gives missing if either the fixed effect is missing (i.e. the predict df has a level that wasn't present in the original df) or any of the xs is missing. Would you be looking for some other behaviour, e.g. setting the fe to 0 or grand mean or something?
what you're saying is correct — just check that it gives missing if any of the covariate or fixed effect is missing.
Here's a first stab at an implementation for
predict
forFixedEffectModels
. Essentially thisleftjoin
s the fixed effects onto the relevant columns of the data passed topredict
, and then sums them to create a vector that is added to the "regular" prediction obtained by multiplying the non-FE columns with their respective coefficients.I have checked that this works on the original data, as well as a new data set with the same levels. It also provides comparable predictions to a
predict
call on the model estimated without marking the categorical variables out asfe()
s. When a new data set with missing observations is passed the code errors, which appears consistent with what currently happens forpredict
with a non-FE model.One difference in behaviour is for the case of new levels in the fixed effects - in the case of a non-FE model,
predict
currently errors, while with this PR, for a model thathas_fe
, predictions are returned, withmissing
in rows where a new level is encountered in a fixed effect which was not included in the original data (this is an artefact ofleftjoin
producingmissing
in that case).Happy to discuss whether this gives a reasonable user experience. Two things I haven't thought about here:
predict
works forGLM
s, but I'm not sure it's straightforward as standard errors on the fixed effects aren't available I think?predict
gives for a non-FE model on the same data. Suggestions on what should go into the test suite for CI?