Open iwelch opened 6 years ago
To compare with Stata
reg y x [pw = w]
displays the sum of weights, but does not store it in e()
svy: reg y x
where svyset [pw = w]
does indeed store the weights in e()
. It uses e(N)
for the number of rows and e(N_pop)
for the sum of weights. As noted on Discourse:
I’m afraid it’s more complex than that. For example, with frequency/replicate weights, the apparent “number of observations” doesn’t have any meaning, it’s just the way the data has been compressed to save space. So it would be misleading to have nobs return that.
A solution would be to have a keyword argument to request the (unweighted) number of rows.
Would you be open to exporting a function that inspects the model frame in the output for the number of rows in the underlying data set?
However I understand that we want to be agnostic about the input data type.
We would need to require a specific layout from all models to do that (https://github.com/JuliaStats/StatsModels.jl/issues/32). Barring that solution, it doesn't seem to hard to require models to implement that simple method.
Thanks for the link. If the officially sanctioned API for all models is still moving, I would like for some sort of unweightedobs()
function to be implemented.
However I generally write closures for any regression function, including a custom output struct. So it's not a huge deal if I have to write a function to get the unweighted N.
For the sake of completeness:
felm
in the R package lfe
returns a model where you can do
m$M
: number of rows in the matrixm$weights
: the vector of weights used in the regression.
nobs should probably return nrow(m.mf.df), an integer. otherwise, it seems like a misnomer. it is also unexpected to get a Float for standard use(s).
the current nobs should/could probably be named wobs. with weights all equal to 1, it is the same as nobs(), albeit Float.
/iaw