Closed MichaelChirico closed 4 years ago
From: Prof Brian Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> Berwin,
It's fortuitous that this works at all.
The problem is not in update but in glm itself: take a look at fm$formula after the first fit. fm$terms has the right formula, but formula() extracts the $formula component first.
I am not currently sure what the right fix is, so will not try to fix this in R-patched/1.9.1.
Brian
On Fri, 21 May 2004 berwin@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::> wrote:
Dear all,
I noticed the following while playing around with fitting log-linear
models to contingency tables using R 1.8.1, but the problem also
exists under R 1.9.0.
A reproducible example uses the following contingency table:
> library(MASS)
> data(quine)
> tmp <- with(quine, expand.grid(Eth=levels(Eth), Sex=levels(Sex),
+ Lrn=levels(Lrn), Age=levels(Age)))
> n <- nrow(quine)
> quine2 <- with(quine,
+ data.frame(tmp,
+ Count=as.vector(tapply(rep(1,n), list(Eth, Sex, Lrn, Age), sum))))
First fit a saturated model and see which term we can drop:
> fm <- glm(Count ~ .^4, quine2, family=poisson)
> drop1(fm, test="Chisq")
Single term deletions
Model:
Count ~ .^4
Df Deviance AIC LRT Pr(Chi)
<none> 6.661e-16 148.916
Eth:Sex:Lrn:Age 2 0.422 145.338 0.422 0.81
Drop the four-way interaction and check which term we can drop next:
> fm <- update(fm, . ~ . - Eth:Sex:Lrn:Age)
> drop1(fm, test="Chisq")
Single term deletions
Model:
Count ~ .
Df Deviance AIC LRT Pr(Chi)
<none> 35.893 142.810
Eth 1 36.332 141.248 0.439 0.507811
Sex 1 37.238 142.154 1.345 0.246237
Lrn 1 37.392 142.308 1.499 0.220842
Age 3 49.818 150.735 13.925 0.003009 **
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
All of a sudden there are only the main effects left and all
interaction terms have been removed. I expected to see at this point
a test for all three-way interactions. Since I started of with the
saturated model and used update to remove the four-way interaction,
all two-way and three-way interactions should still be in the model.
Or am I missing some subtlety of the "^" and "." operators in model
formulae?
On the other hand, if we specify the saturated model as follows:
> fm <- glm(Count ~ Lrn*Eth*Sex*Age, quine2, family=poisson)
then the above steps have the desired effect:
> drop1(fm, test="Chisq")
Single term deletions
Model:
Count ~ Lrn * Eth * Sex * Age
Df Deviance AIC LRT Pr(Chi)
<none> 4.219e-15 148.916
Lrn:Eth:Sex:Age 2 0.422 145.338 0.422 0.81
> fm <- update(fm, . ~ . - Eth:Sex:Lrn:Age)
> drop1(fm, test="Chisq")
Single term deletions
Model:
Count ~ Lrn + Eth + Sex + Age + Lrn:Eth + Lrn:Sex + Eth:Sex +
Lrn:Age + Eth:Age + Sex:Age + Lrn:Eth:Sex + Lrn:Eth:Age +
Lrn:Sex:Age + Eth:Sex:Age
Df Deviance AIC LRT Pr(Chi)
<none> 0.422 145.338
Lrn:Eth:Sex 1 0.493 143.409 0.071 0.789909
Lrn:Eth:Age 2 0.629 141.545 0.207 0.901471
Lrn:Sex:Age 2 12.728 153.644 12.306 0.002127 **
Eth:Sex:Age 3 1.019 139.935 0.597 0.897103
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
The update command only removed the four-way interaction but left all
the two-way and three-way interaction in the model.
I looked at update.formula to see what the problem might be, but after
reading the help file of .Internal I stopped looking around.... :)
Cheers,
Berwin
--please do not edit the information below--
Version:
platform = i686-pc-linux-gnu
arch = i686
os = linux-gnu
system = i686, linux-gnu
status =
major = 1
minor = 9.0
year = 2004
month = 04
day = 12
language = R
Search Path:
.GlobalEnv, package:MASS, package:methods, package:stats, package:graphics,
package:utils, Autoloads, package:base
______________________________________________
R-devel@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::> mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
-- Brian D. Ripley, ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ <CENSORING FROM DETECTED PHONE NUMBER ONWARDS; SEE BUGZILLA>
NOTES: Changed (for lm and glm) in 2.0.0-to-be
Audit (from Jitterbug): Mon May 24 21:54:16 2004 ripley changed notes Mon May 24 21:54:16 2004 ripley moved from incoming to Models-fixed
From: Berwin A Turlach <berwin@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> Dear Brian,
>>>> "BDR" == Prof Brian Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> writes:
BDR> The problem is not in update but in glm itself: take a look
BDR> at fm$formula after the first fit. fm$terms has the right
BDR> formula, but formula() extracts the $formula component first.
O.k., I see what you mean.
BDR> I am not currently sure what the right fix is, so will not
BDR> try to fix this in R-patched/1.9.1.
This may be naive, but I noticed the following at the end of the glm function (starting line 76, in src/library/stats/R/glm.R):
fit <- c(fit, list(call = call, formula = formula,
terms = mt, data = data,
offset = offset, control = control, method = method,
contrasts = attr(X, "contrasts"),
xlevels = .getXlevels(mt, mf)))
That is, the component "formula" in the object that is returned is explicitly set to the "formula" argument with which glm was called. The function "lm" does not do this. And if I change this `line' to
fit <- c(fit, list(call = call,
terms = mt, data = data,
offset = offset, control = control, method = method,
contrasts = attr(X, "contrasts"),
xlevels = .getXlevels(mt, mf)))
then fm$formula for the first fit in my example looks o.k.
But I don't know if this will break anything else.
Cheers,
Berwin
PS: I noticed that both `glm' and 'lm' now do essentially the following manipulations early on:
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "weights", "na.action",
"offset"), names(mf), 0)
mf <- mf[c(1, m)]
mf$drop.unused.levels <- TRUE
mf[[1]] <- as.name("model.frame")
mf <- eval(mf, parent.frame())
Is this the preferred idiom to be used in R 1.9.x and later for modelling functions?
From: Prof Brian Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> On Wed, 26 May 2004, Berwin A Turlach wrote:
Dear Brian,
>>>>> "BDR" == Prof Brian Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> writes:
BDR> The problem is not in update but in glm itself: take a look
BDR> at fm$formula after the first fit. fm$terms has the right
BDR> formula, but formula() extracts the $formula component first.
O.k., I see what you mean.
BDR> I am not currently sure what the right fix is, so will not
BDR> try to fix this in R-patched/1.9.1.
This may be naive, but I noticed the following at the end of the glm
function (starting line 76, in src/library/stats/R/glm.R):
fit <- c(fit, list(call = call, formula = formula,
terms = mt, data = data,
offset = offset, control = control, method = method,
contrasts = attr(X, "contrasts"),
xlevels = .getXlevels(mt, mf)))
That is, the component "formula" in the object that is returned is
explicitly set to the "formula" argument with which glm was called.
The function "lm" does not do this. And if I change this `line' to
fit <- c(fit, list(call = call,
terms = mt, data = data,
offset = offset, control = control, method = method,
contrasts = attr(X, "contrasts"),
xlevels = .getXlevels(mt, mf)))
then fm$formula for the first fit in my example looks o.k.
You won't have a formula component at all (lm objects do not). What I have done is alter the formula methods to pick up the formula from the terms component but the environment from the formula componet (if present).
But I don't know if this will break anything else.
Cheers,
Berwin
PS: I noticed that both `glm' and 'lm' now do essentially the
following manipulations early on:
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "weights", "na.action",
"offset"), names(mf), 0)
mf <- mf[c(1, m)]
mf$drop.unused.levels <- TRUE
mf[[1]] <- as.name("model.frame")
mf <- eval(mf, parent.frame())
Is this the preferred idiom to be used in R 1.9.x and later for
modelling functions?
Rather than removing components, by setting to NULL, yes.
-- Brian D. Ripley, ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ <CENSORING FROM DETECTED PHONE NUMBER ONWARDS; SEE BUGZILLA>
From: Berwin A Turlach <berwin@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> Dear all,
I noticed the following while playing around with fitting log-linear models to contingency tables using R 1.8.1, but the problem also exists under R 1.9.0.
A reproducible example uses the following contingency table:
+ Lrn=levels(Lrn), Age=levels(Age)))
+ data.frame(tmp, + Count=as.vector(tapply(rep(1,n), list(Eth, Sex, Lrn, Age), sum))))
First fit a saturated model and see which term we can drop:
Drop the four-way interaction and check which term we can drop next:
All of a sudden there are only the main effects left and all interaction terms have been removed. I expected to see at this point a test for all three-way interactions. Since I started of with the saturated model and used update to remove the four-way interaction, all two-way and three-way interactions should still be in the model. Or am I missing some subtlety of the "^" and "." operators in model formulae?
On the other hand, if we specify the saturated model as follows:
then the above steps have the desired effect:
The update command only removed the four-way interaction but left all the two-way and three-way interaction in the model.
I looked at update.formula to see what the problem might be, but after reading the help file of .Internal I stopped looking around.... :)
Cheers,
--please do not edit the information below--
Version: platform = i686-pc-linux-gnu arch = i686 os = linux-gnu system = i686, linux-gnu status = major = 1 minor = 9.0 year = 2004 month = 04 day = 12 language = R
Search Path: .GlobalEnv, package:MASS, package:methods, package:stats, package:graphics, package:utils, Autoloads, package:base
METADATA