[BUGZILLA #397] bug in glm

From: Peter Holzer <holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> Dear R-team

As I didn't get any answer to my bug-report last week I have taken the effort and extracted a minimal data set from my data (see below) where the following bug occurs:

glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action = na.omit)

Error in names<-.default(tmp, value = ynames) : names attribute must be the same length as the vector In addition: Warning messages: 1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
5: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
6: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
7: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,

"bugdata" <- structure(list(TAGNR = c(1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24), FAC.A1 = c(1.890074713, 0.285627892, -0.728152664, 0.117508066, -0.316988379, -0.614649045, -0.969428677, -0.036, -0.139119729, -0.25405231, -0.278163883, 0.296010189, 0.290285736, -0.036, -0.304357608, -0.129913755, -0.136209139, -0.290925392, 0.219392415, 0.418347431), FAC.A2 = c(1.626524757, 0.447314597, 0.401823062, 0.161047605, 0.470022244, -1.227729293, -0.702664589, 0.065, -0.477090522, -0.464008109, -0.264137857, 0.576270516, 1.157658697, 0.065, -0.018453905, -3.059806667, 0.694435007, 0.074223629, 1.115136699, 0.652975139), MLDR = c(9529, 9637, 9605, 9631, 9674, 9636, 9640, 9733, 9734, 9726, 9747, 9643, 9593, 9582, 9699, 9810, 9820, 9813, 9824, 9840), MWSQUAL = c(5, 1, 7.5, 7, 0.5, 6, 6, 8, 7, 6, 7, 7, 7, 8, 8, 7, 7, 7, 7, 8), FAC.M1 = c(0.346099763, -1.404917729, -0.122803627, -1.105302108, -0.820151926, -0.148255768, -0.059281084, -0.177, -0.597900864, -0.385193779, 0.576167774, -0.213567265, -0.087008627, -0.150494043, 0.061886447, -0.000514461, 0.52592456, 0.25696687, 0.119899896, -0.158271395), FAC.M2 = c(0.509341516, -1.28717534, 0.484499473, 0.609920083, -1.049431096, 0.523520113, 0.00821345, -0.096, -1.526343789, -0.698071247, 0.404253968, 0.384329318, -0.354346109, 0.43895688, -2.077185788, -1.044851055, 0.623330523, 0.791948597, 1.120535816, 0.314008432), SKR.ein.aus = c(0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0)), .Names = c("TAGNR", "FAC.A1", "FAC.A2", "MLDR", "MWSQUAL", "FAC.M1", "FAC.M2", "SKR.ein.aus" ), row.names = c("139", "140", "141", "142", "143", "144", "145", "146", "147", "148", "149", "150", "151", "152", "153", "154", "155", "156", "157", "158"), class = "data.frame")

I hope, this makes it possible to find the bug.

Peter

--please do not edit the information below--

Version: platform = i686-unknown-linux arch = i686 os = linux system = i686, linux status = major = 0 minor = 90.1 year = 1999 month = December day = 15 language = R

Search Path: .GlobalEnv, package:ctest, package:lqs, package:MASS, package:SfS, Autoloads, package:base

METADATA

Bug author - Jitterbug compatibility account
Creation time - 2000-01-10 16:53:45 UTC
Bugzilla link
Status - CLOSED FIXED
Alias - None
Component - Models
Version - old
Hardware - All Linux
Importance - P5 normal
Assignee - Jitterbug compatibility account
URL -
Modification time - 2000-01-10 17:38 UTC

From: Martin Maechler <maechler@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>>

>>>> "PeHo" == Peter Holzer <holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>> writes:

PeHo> Dear R-team As I didn't get any answer to my bug-report last week
PeHo> I have taken the effort and extracted a minimal data set from my
PeHo> data (see below) where the following bug occurs:

> glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action =
> na.omit)

(the ", na.action = na.omit" doesn't matter since the examples hasn't got any NA)

> Error in names<-.default(*tmp*, value = ynames) : names attribute must be the same length as the vector

> In addition: Warning messages: 
> 1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 5: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 6: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  
> 7: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,

So, "7:" really says that the Algorithm didn't converge.

Of course, there is a bug, namely, ending in such an (unhelpful) error instead of doing something better.

If I run this example through S-plus 5.1 (Linux), it doesn't converge either, but does return a (possibly completely wrong ??) result, and only warns you with one line that you shouldn't overlook :

> S> glm(SKR.ein.aus ~ ., family = binomial, data = bugdata)

> Call:
> glm(formula = SKR.ein.aus ~ ., family = binomial, data = bugdata)
> 
> Coefficients:
>  (Intercept)    TAGNR    FAC.A1   FAC.A2      MLDR   MWSQUAL    FAC.M1 
>    -5022.282 1.561878 -169.0165 135.3793 0.6264029 -143.0841 -115.7853
> 
>     FAC.M2 
>  -233.0438
> 
> Degrees of Freedom: 20 Total; 12 Residual
> Residual Deviance: 0.674096 
> There were 12 warnings (use warnings() to see them)

If you are careful enough to read the last line, and then type warnings(), you see

> Warning messages --
> 1: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 2: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 3: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 4: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 5: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 6: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 7: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 8: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 9: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 10: Model unstable; fitted probabilities of 0 or 1 in: family$deriv(mu)
> 11: fitted values close to 0 or 1 in: family$deviance(mu, y, w)
> 12: linear convergence not obtained in  10  iterations. in: glm.fitter(x =
> X,
>      y = Y, w = w, start = start, offset = offset, family = family,  ...

((which is slightly better than R's warnings...))

PeHo> I hope, this makes it possible to find the bug.

Yes, thanks a lot, Peter!

Tracing the iterations is really the good idea here: glm(SKR.ein.aus∼ ., family = binomial, data = bugdata, maxit= 30, trace=TRUE)

gives

Deviance = 15.39188 Iterations - 1 
Deviance = 13.67612 Iterations - 2 
Deviance = 12.02264 Iterations - 3 
Deviance = 10.41495 Iterations - 4 
Deviance = 9.021518 Iterations - 5 
Deviance = 8.260037 Iterations - 6 
Deviance = 7.468748 Iterations - 7 
Deviance = 4.597021 Iterations - 8 
Deviance = 1.523395 Iterations - 9 
Deviance = 0.5131087 Iterations - 10 
Deviance = 0.1843927 Iterations - 11 
Deviance = 0.06750712 Iterations - 12 
Deviance = 0.0248701 Iterations - 13 
Deviance = 0.00919058 Iterations - 14 
Deviance = 0.003411904 Iterations - 15 
Deviance = 0.001281922 Iterations - 16 
Deviance = 0.0004972102 Iterations - 17 
Deviance = 0.0002082669 Iterations - 18 
Deviance = 0.0001069153 Iterations - 19 
Deviance = 6.777993e-05 Iterations - 20 
Deviance = 5.338077e-05 Iterations - 21 
Deviance = 4.808323e-05 Iterations - 22 
Error in names<-.default(*tmp*, value = ynames) : names attribute must be the same length as the vector
In addition: There were 18 warnings (use warnings() to see them)

(doing the same in S-plus also shows yo that the deviance goes to 0 and no convergence is reached).

------ Now, the "real" glm()ers are called for...

Martin

METADATA

Comment author - Jitterbug compatibility account
Timestamp - 2000-01-10 17:38:24 UTC

From: Prof Brian Ripley <ripley@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>>

Date: Mon, 10 Jan 2000 11:53:50 +0100 (MET)
From: holzer@<::CENSORED -- SEE ORIGINAL ON BUGZILLA::>

Dear R-team

As I didn't get any answer to my bug-report last week I have taken the
effort and extracted a minimal data set from my data (see below) where the
following bug occurs:

> glm(SKR.ein.aus ~ ., family = binomial, data = bugdata, na.action =
na.omit)
Error in names<-.default(*tmp*, value = ynames) : names attribute must be the

same length as the vector

In addition: Warning messages: 
1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt))