[BUGZILLA #16665] dummy.coef fails when transformations are included in formula

MichaelChirico commented 4 years ago

Created attachment 1999 [details] dummy.coef.fix.R

The function dummy.coef.lm fails in more complex cases, notably when terms include variables that are transformed in the formula of the model.

r.lm <- lm(Fertility∼ cut(Agriculture, breaks=4) + Infant.Mortality, data=swiss) dummy.coef(r.lm)

Error in model.frame.default(Terms, dummy, na.action = function(x) x, : factor cut(Agriculture, breaks = 4) has new level (0.9995,1]

The problem is that ii works with all.vars , which returns untransformed variables. This is fixed by using model.frame instead -- which is needed later in the function anyway.

The function dummy.coef.fix does this.

dummy.coef.fix(r.lm)

Thus, dummy.coef.lm should be replaced by dummt.coef.fix .

In the function, there is a warning warning("some terms will have NAs due to the limits of the method") I wonder why this is a "limit' (->limitation) of the method. If some interaction coefficients are undetermined because the respective combination of levels is not available, NA is the appropriate result. Are there other cases?

I have extended the function to include confidence intervals and t-tests and call the extended function allcoef . The latter are what is shown by summary.lm, except that for the (dumy) variable that is eliminated by the contrasts . For treatment contrasts, the added information is trivial (0 with 0 standard error), but for sum (or weighted sum) contrasts, it is not, and for other contrasts, it may still recover more useful information. The function would need some polishing to work in general contexts. Let me know if you are interested.

Werner Stahel, Jan 4, 2016

METADATA

Bug author - Werner A. Stahel
Creation time - 2016-01-11 12:32:10 UTC
Bugzilla link
Status - ASSIGNED
Alias - None
Component - Analyses
Version - R 3.2.3
Hardware - Other Linux
Importance - P5 enhancement
Assignee - R-core
URL -
Modification time - 2020-02-08 19:44 UTC

MichaelChirico commented 4 years ago

Thank you, Werner.

I can confirm that your version works for the example where the current stats package one fails. Your version also fixes the similar problem reported to R-help "bug in dummy.coef?" https://stat.ethz.ch/pipermail/r-help/2013-October/362106.html

I've spent a bit of time because your version had quite a few changes that were not necessary (you renamed three of the internal variables) and your version must have come from simple "print()"ing of the function definition in an older version of R, so your code misses the comments from the source code and e.g., the newer anyNA() use. Note that the most current source (of "R-devel") is always (for this function) https://svn.r-project.org/R/trunk/src/library/base/R/dummy.coef.R ((but to find this file, you most easly get a source "tarball" from one of the places linked from https://www.r-project.org/sources.html -- note the daily versions provided by "SfS"!) or if you prefer the web, you can use the 'site:svn.r-project.org/R' trick : https://www.google.ch/search?q=site:svn.r-project.org/R++%27dummy.coef%27&ie=utf-8&oe=utf-8&gws_rd=cr&ei=1t2hVqqGDoXxUt_spugL))

Your question about the warning: I also find it a bit "strange". One could replace "due to the limits of the method" by "due to the design" (meaning the linear model design matrix), but I think you are suggesting that no warning should be given there, right?

I did not easily find a case that triggers the warning. Do you have one?

Best regards, Martin

METADATA

Comment author - Martin Maechler
Timestamp - 2016-01-22 07:45:43 UTC

MichaelChirico commented 4 years ago

Should this be closed? With revision 70020, doc/NEWS.Rd has:

 \item \code{dummy.coef.lm()} now works in more cases, thanks to a
  proposal by Werner Stahel (\PR{16665}).

...or maybe it stays open until resolution of the question about warning("some terms will have NAs due to the limits of the method")?

METADATA

Comment author - Benjamin Tyner
Timestamp - 2020-02-08 19:44:43 UTC

MichaelChirico / r-bugs