MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #397] bug in glm #558

Closed MichaelChirico closed 4 years ago

MichaelChirico commented 4 years ago

From: Peter Holzer <holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> Dear R-team

As I didn't get any answer to my bug-report last week I have taken the effort and extracted a minimal data set from my data (see below) where the following bug occurs:

glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action = na.omit)

Error in names<-.default(tmp, value = ynames) : names attribute must be the same length as the vector In addition: Warning messages: 1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
5: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
6: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
7: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,

"bugdata" <- structure(list(TAGNR = c(1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24), FAC.A1 = c(1.890074713, 0.285627892, -0.728152664, 0.117508066, -0.316988379, -0.614649045, -0.969428677, -0.036, -0.139119729, -0.25405231, -0.278163883, 0.296010189, 0.290285736, -0.036, -0.304357608, -0.129913755, -0.136209139, -0.290925392, 0.219392415, 0.418347431), FAC.A2 = c(1.626524757, 0.447314597, 0.401823062, 0.161047605, 0.470022244, -1.227729293, -0.702664589, 0.065, -0.477090522, -0.464008109, -0.264137857, 0.576270516, 1.157658697, 0.065, -0.018453905, -3.059806667, 0.694435007, 0.074223629, 1.115136699, 0.652975139), MLDR = c(9529, 9637, 9605, 9631, 9674, 9636, 9640, 9733, 9734, 9726, 9747, 9643, 9593, 9582, 9699, 9810, 9820, 9813, 9824, 9840), MWSQUAL = c(5, 1, 7.5, 7, 0.5, 6, 6, 8, 7, 6, 7, 7, 7, 8, 8, 7, 7, 7, 7, 8), FAC.M1 = c(0.346099763, -1.404917729, -0.122803627, -1.105302108, -0.820151926, -0.148255768, -0.059281084, -0.177, -0.597900864, -0.385193779, 0.576167774, -0.213567265, -0.087008627, -0.150494043, 0.061886447, -0.000514461, 0.52592456, 0.25696687, 0.119899896, -0.158271395), FAC.M2 = c(0.509341516, -1.28717534, 0.484499473, 0.609920083, -1.049431096, 0.523520113, 0.00821345, -0.096, -1.526343789, -0.698071247, 0.404253968, 0.384329318, -0.354346109, 0.43895688, -2.077185788, -1.044851055, 0.623330523, 0.791948597, 1.120535816, 0.314008432), SKR.ein.aus = c(0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0)), .Names = c("TAGNR", "FAC.A1", "FAC.A2", "MLDR", "MWSQUAL", "FAC.M1", "FAC.M2", "SKR.ein.aus" ), row.names = c("139", "140", "141", "142", "143", "144", "145", "146", "147", "148", "149", "150", "151", "152", "153", "154", "155", "156", "157", "158"), class = "data.frame")

I hope, this makes it possible to find the bug.

Peter

--please do not edit the information below--

Version: platform = i686-unknown-linux arch = i686 os = linux system = i686, linux status = major = 0 minor = 90.1 year = 1999 month = December day = 15 language = R

Search Path: .GlobalEnv, package:ctest, package:lqs, package:MASS, package:SfS, Autoloads, package:base


METADATA

MichaelChirico commented 4 years ago

From: Martin Maechler <maechler@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>>

>>>> "PeHo" == Peter Holzer <holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> writes:
PeHo> Dear R-team As I didn't get any answer to my bug-report last week
PeHo> I have taken the effort and extracted a minimal data set from my
PeHo> data (see below) where the following bug occurs:
> glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action =
> na.omit)

(the ", na.action = na.omit" doesn't matter since the examples hasn't got any NA)

> Error in names<-.default(*tmp*, value = ynames) : names attribute must be the same length as the vector
> In addition: Warning messages: 
> 1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 5: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 6: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 7: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  

So, "7:" really says that the Algorithm didn't converge.

Of course, there is a bug, namely, ending in such an (unhelpful) error instead of doing something better.

If I run this example through S-plus 5.1 (Linux), it doesn't converge either, but does return a (possibly completely wrong ??) result, and only warns you with one line that you shouldn't overlook :

> S> glm(SKR.ein.aus ~ ., family = binomial, data = bugdata)
> Call:
> glm(formula = SKR.ein.aus ~ ., family = binomial, data = bugdata)
> 
> Coefficients:
>  (Intercept)    TAGNR    FAC.A1   FAC.A2      MLDR   MWSQUAL    FAC.M1 
>    -5022.282 1.561878 -169.0165 135.3793 0.6264029 -143.0841 -115.7853
> 
>     FAC.M2 
>  -233.0438
> 
> Degrees of Freedom: 20 Total; 12 Residual
> Residual Deviance: 0.674096 
> There were 12 warnings (use warnings() to see them)

If you are careful enough to read the last line, and then type warnings(), you see

> Warning messages --
> 1: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 2: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 3: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 4: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 5: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 6: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 7: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 8: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 9: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 10: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 11: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 12: linear convergence not obtained in  10  iterations. in: glm.fitter(x =
> X,
>      y = Y, w = w, start = start, offset = offset, family = family,  ...

((which is slightly better than R's warnings...))

PeHo> I hope, this makes it possible to find the bug.

Yes, thanks a lot, Peter!

Tracing the iterations is really the good idea here: glm(SKR.ein.aus∼ ., family = binomial, data = bugdata, maxit= 30, trace=TRUE)

gives

Deviance = 15.39188 Iterations - 1 
Deviance = 13.67612 Iterations - 2 
Deviance = 12.02264 Iterations - 3 
Deviance = 10.41495 Iterations - 4 
Deviance = 9.021518 Iterations - 5 
Deviance = 8.260037 Iterations - 6 
Deviance = 7.468748 Iterations - 7 
Deviance = 4.597021 Iterations - 8 
Deviance = 1.523395 Iterations - 9 
Deviance = 0.5131087 Iterations - 10 
Deviance = 0.1843927 Iterations - 11 
Deviance = 0.06750712 Iterations - 12 
Deviance = 0.0248701 Iterations - 13 
Deviance = 0.00919058 Iterations - 14 
Deviance = 0.003411904 Iterations - 15 
Deviance = 0.001281922 Iterations - 16 
Deviance = 0.0004972102 Iterations - 17 
Deviance = 0.0002082669 Iterations - 18 
Deviance = 0.0001069153 Iterations - 19 
Deviance = 6.777993e-05 Iterations - 20 
Deviance = 5.338077e-05 Iterations - 21 
Deviance = 4.808323e-05 Iterations - 22 
Error in names<-.default(*tmp*, value = ynames) : names attribute must be the same length as the vector
In addition: There were 18 warnings (use warnings() to see them)

(doing the same in S-plus also shows yo that the deviance goes to 0 and no convergence is reached).

------ Now, the "real" glm()ers are called for...

Martin


METADATA

MichaelChirico commented 4 years ago

From: Prof Brian Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>>

Date: Mon, 10 Jan 2000 11:53:50 +0100 (MET)
From: holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>

Dear R-team

As I didn't get any answer to my bug-report last week I have taken the
effort and extracted a minimal data set from my data (see below) where the
following bug occurs:

> glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action =
na.omit)
Error in names<-.default(*tmp*, value = ynames) : names attribute must be the 

same length as the vector

In addition: Warning messages: 
1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) 

glm.fit.null else glm.fit)(x = X, y = Y,

2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) 

glm.fit.null else glm.fit)(x = X, y = Y,

3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) 

glm.fit.null else glm.fit)(x = X, y = Y,

4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) 

glm.fit.null else glm.fit)(x = X, y = Y,

5: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) 

glm.fit.null else glm.fit)(x = X, y = Y,

6: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) 

glm.fit.null else glm.fit)(x = X, y = Y,

7: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else 

glm.fit)(x = X, y = Y,


"bugdata" <-
structure(list(TAGNR = c(1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 
16, 17, 18, 19, 20, 21, 22, 23, 24), FAC.A1 = c(1.890074713, 
0.285627892, -0.728152664, 0.117508066, -0.316988379, -0.614649045, 
-0.969428677, -0.036, -0.139119729, -0.25405231, -0.278163883, 
0.296010189, 0.290285736, -0.036, -0.304357608, -0.129913755, 
-0.136209139, -0.290925392, 0.219392415, 0.418347431), FAC.A2 =
c(1.626524757, 
0.447314597, 0.401823062, 0.161047605, 0.470022244, -1.227729293, 
-0.702664589, 0.065, -0.477090522, -0.464008109, -0.264137857, 
0.576270516, 1.157658697, 0.065, -0.018453905, -3.059806667, 
0.694435007, 0.074223629, 1.115136699, 0.652975139), MLDR = c(9529, 
9637, 9605, 9631, 9674, 9636, 9640, 9733, 9734, 9726, 9747, 9643, 
9593, 9582, 9699, 9810, 9820, 9813, 9824, 9840), MWSQUAL = c(5, 
1, 7.5, 7, 0.5, 6, 6, 8, 7, 6, 7, 7, 7, 8, 8, 7, 7, 7, 7, 8), 
FAC.M1 = c(0.346099763, -1.404917729, -0.122803627, -1.105302108, 
-0.820151926, -0.148255768, -0.059281084, -0.177, -0.597900864, 
-0.385193779, 0.576167774, -0.213567265, -0.087008627, -0.150494043, 
0.061886447, -0.000514461, 0.52592456, 0.25696687, 0.119899896, 
-0.158271395), FAC.M2 = c(0.509341516, -1.28717534, 0.484499473, 
0.609920083, -1.049431096, 0.523520113, 0.00821345, -0.096, 
-1.526343789, -0.698071247, 0.404253968, 0.384329318, -0.354346109, 
0.43895688, -2.077185788, -1.044851055, 0.623330523, 0.791948597, 
1.120535816, 0.314008432), SKR.ein.aus = c(0, 1, 1, 1, 1, 
0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0)), .Names = c("TAGNR", 
"FAC.A1", "FAC.A2", "MLDR", "MWSQUAL", "FAC.M1", "FAC.M2", "SKR.ein.aus"
), row.names = c("139", "140", "141", "142", "143", "144", "145", 
"146", "147", "148", "149", "150", "151", "152", "153", "154", 
"155", "156", "157", "158"), class = "data.frame")

I hope, this makes it possible to find the bug.

Indeed, thank you (we needed an example). Alter

names(w) <- ynames

to names(w) <- ynames[good]

and

names(fit$effects) <-
c(xxnames[seq(fit$rank)], rep("", nobs - fit$rank))

to names(fit$effects) <- c(xxnames[seq(fit$rank)], rep("", sum(good) - fit$rank))

as the vector w is dropping observations with fitted probs 0 or 1.

Brian Ripley

-- Brian D. Ripley, ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ <CENSORING FROM DETECTED PHONE NUMBER ONWARDS; SEE BUGZILLA>


METADATA

MichaelChirico commented 4 years ago

From: Peter Holzer <holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> Prof Brian Ripley writes:

> Date: Mon, 10 Jan 2000 11:53:50 +0100 (MET)
> From: holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>
> 
> Dear R-team
> 
> As I didn't get any answer to my bug-report last week I have taken the
> effort and extracted a minimal data set from my data (see below) where the
> following bug occurs:
> 
> > glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action = na.omit)
> Error in names<-.default(*tmp*, value = ynames) : names attribute must be the 
same length as the vector
Indeed, thank you (we needed an example). Alter

names(w) <- ynames

to
names(w) <- ynames[good]

and

names(fit$effects) <-
c(xxnames[seq(fit$rank)], rep("", nobs - fit$rank))

to 
names(fit$effects) <-
c(xxnames[seq(fit$rank)], rep("", sum(good) - fit$rank))

as the vector w is dropping observations with fitted probs 0 or 1.

This is actually a solution I thought of as well. However it has the disadvantage that plot (and other functions?) gets problems:

fit.tst <- glm(SKR.ein.aus ~ ., family = binomial, data = bugdata)

Warning messages: 1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, . . . 7: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,

plot(fit.tst)

Warning messages: 1: longer object length is not a multiple of shorter object length in: sqrt(w) r 2: longer object length is not a multiple of shorter object length in: r/sqrt(1 - hii) 3: longer object length is not a multiple of shorter object length in: sqrt(w) r 4: longer object length is not a multiple of shorter object length in: sqrt(w) r 5: longer object length is not a multiple of shorter object length in: e/(s (1 - h)) 6: longer object length is not a multiple of shorter object length in: (e/(s (1 - h)))^2 h

Maybe there is another solution?

Nevertheless, thanks for the help!


<CENSORING FROM DETECTED PHONE NUMBER ONWARDS; SEE BUGZILLA>


METADATA

MichaelChirico commented 4 years ago

From: Prof Brian D Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> On Mon, 10 Jan 2000, Peter Holzer wrote:

Prof Brian Ripley writes:
> > Date: Mon, 10 Jan 2000 11:53:50 +0100 (MET)
> > From: holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>
> > 
> > Dear R-team
> > 
> > As I didn't get any answer to my bug-report last week I have taken the
> > effort and extracted a minimal data set from my data (see below) where
the
> > following bug occurs:
> > 
> > > glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action =
na.omit)
> > Error in names<-.default(*tmp*, value = ynames) : names attribute must
be the 
> same length as the vector
> Indeed, thank you (we needed an example). Alter
> 
>     names(w) <- ynames
>     
> to
>     names(w) <- ynames[good]
> 
> and
> 
>     names(fit$effects) <-
>    c(xxnames[seq(fit$rank)], rep("", nobs - fit$rank))
> 
> to 
>     names(fit$effects) <-
>    c(xxnames[seq(fit$rank)], rep("", sum(good) - fit$rank))
> 
> as the vector w is dropping observations with fitted probs 0 or 1.

This is actually a solution I thought of as well. However it has the
disadvantage that plot (and other functions?) gets problems:

> fit.tst <- glm(SKR.ein.aus ~ ., family = binomial, data = bugdata)
Warning messages: 
1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt))
glm.fit.null else glm.fit)(x = X, y = Y, 
.
.
.
7: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else
glm.fit)(x = X, y = Y,  
> plot(fit.tst)

We already know that plot.lm does not work correctly with glm objects with zero weights. I am not at all sure it should be used for glm objects (except Gaussian ones). However, I think you missed Martin M's point. This behaviour indicates an inappropriate model, one with separation of the groups, and all the linear approximations needed for a local lm model are inappropriate.

We also know that the whole glm area needs a long hard look.

-- Brian D. Ripley, ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ <CENSORING FROM DETECTED PHONE NUMBER ONWARDS; SEE BUGZILLA>


METADATA

MichaelChirico commented 4 years ago

From: Prof Brian Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>>

From: Peter Holzer <holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>>
Date: Mon, 10 Jan 2000 18:06:59 +0100 (MET)
To: Prof Brian Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>>
Cc: R-bugs@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>
Subject: Re: bug in glm (PR#397)
X-Keywords: 

Prof Brian Ripley writes:
> > Date: Mon, 10 Jan 2000 11:53:50 +0100 (MET)
> > From: holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>
> > 
> > Dear R-team
> > 
> > As I didn't get any answer to my bug-report last week I have taken the
> > effort and extracted a minimal data set from my data (see below) where 

the

> > following bug occurs:
> > 
> > > glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action = 

na.omit)

> > Error in names<-.default(*tmp*, value = ynames) : names attribute must
be 

the

> same length as the vector
> Indeed, thank you (we needed an example). Alter
> 
>     names(w) <- ynames
>     
> to
>     names(w) <- ynames[good]
> 
> and
> 
>     names(fit$effects) <-
>    c(xxnames[seq(fit$rank)], rep("", nobs - fit$rank))
> 
> to 
>     names(fit$effects) <-
>    c(xxnames[seq(fit$rank)], rep("", sum(good) - fit$rank))
> 
> as the vector w is dropping observations with fitted probs 0 or 1.

This is actually a solution I thought of as well. However it has the
disadvantage that plot (and other functions?) gets problems:

> fit.tst <- glm(SKR.ein.aus ~ ., family = binomial, data = bugdata)
Warning messages: 
1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt))
glm.fit.null else glm.fit)(x = X, y = Y, 
.
.
.
7: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else 

glm.fit)(x = X, y = Y,

> plot(fit.tst)
Warning messages: 
1: longer object length
is not a multiple of shorter object length in: sqrt(w) * r 
2: longer object length
is not a multiple of shorter object length in: r/sqrt(1 - hii) 
3: longer object length
is not a multiple of shorter object length in: sqrt(w) * r 
4: longer object length
is not a multiple of shorter object length in: sqrt(w) * r 
5: longer object length
is not a multiple of shorter object length in: e/(s * (1 - h)) 
6: longer object length
is not a multiple of shorter object length in: (e/(s * (1 - h)))^2 * h 

Maybe there is another solution?

Yes, there is. I have now altered glm in the development version to handle zero weights in the same way as lm, and plot.lm now works for such glm objects (although the assumptions to use lm methods on an glm fit are invalid in this example, even if run to convergence). In essence:

wt <- rep(0, nobs)
wt[good] <- w^2
names(wt) <- ynames

... null.deviance = nulldev, iter = iter, weights = wt, prior.weights = weights

-- Brian D. Ripley, ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ <CENSORING FROM DETECTED PHONE NUMBER ONWARDS; SEE BUGZILLA>


METADATA

MichaelChirico commented 4 years ago

NOTES: same as PR#395


METADATA

MichaelChirico commented 4 years ago

Audit (from Jitterbug): Sat Jan 15 18:23:04 2000 ripley changed notes Sat Jan 15 18:23:04 2000 ripley moved from incoming to Models-fixed


METADATA