JuliaStats / GLM.jl

Generalized linear models in Julia
Other
595 stars 114 forks source link

nobs() should be number of obs; wobs() should be current nobs #259

Open iwelch opened 6 years ago

iwelch commented 6 years ago

nobs should probably return nrow(m.mf.df), an integer. otherwise, it seems like a misnomer. it is also unexpected to get a Float for standard use(s).

the current nobs should/could probably be named wobs. with weights all equal to 1, it is the same as nobs(), albeit Float.

/iaw

pdeffebach commented 6 years ago

To compare with Stata

nalimilan commented 6 years ago

As noted on Discourse:

I’m afraid it’s more complex than that. For example, with frequency/replicate weights, the apparent “number of observations” doesn’t have any meaning, it’s just the way the data has been compressed to save space. So it would be misleading to have nobs return that.

A solution would be to have a keyword argument to request the (unweighted) number of rows.

pdeffebach commented 6 years ago

Would you be open to exporting a function that inspects the model frame in the output for the number of rows in the underlying data set?

However I understand that we want to be agnostic about the input data type.

nalimilan commented 6 years ago

We would need to require a specific layout from all models to do that (https://github.com/JuliaStats/StatsModels.jl/issues/32). Barring that solution, it doesn't seem to hard to require models to implement that simple method.

pdeffebach commented 6 years ago

Thanks for the link. If the officially sanctioned API for all models is still moving, I would like for some sort of unweightedobs() function to be implemented.

However I generally write closures for any regression function, including a custom output struct. So it's not a huge deal if I have to write a function to get the unweighted N.

pdeffebach commented 6 years ago

For the sake of completeness:

felm in the R package lfe returns a model where you can do