IQSS / Zelig

A statistical framework that serves as a common interface to a large range of models
http://zeligproject.org
111 stars 43 forks source link

Bug in `show` and `plot` in models with factors #226

Closed christophergandrud closed 7 years ago

christophergandrud commented 7 years ago
library(Zelig)

z.out <- zelig(Petal.Width ~ Petal.Length + Species, data = iris, 
                   model = "ls")
x.out1 <- setx(z.out, Petal.Length = 1:10)
sims1 <- sim(z.out, x.out1)
plot(sims1)

Produces this plot:

image

The simulations are done correctly for the range of Petal.Length as we can see from:

zelig_qi_to_df(sims1) %>% qi_slimmer()

##    setx_value Petal.Length Speciesversicolor Speciesvirginica  qi_ci_min qi_ci_median qi_ci_max
## 1           x            1                 0                0 0.07927395    0.1403430 0.2013641
## 2           x            2                 1                0 0.64081014    0.8089075 0.9724328
## 3           x            3                 1                0 0.93372904    1.0362871 1.1392268
## 4           x            4                 0                0 0.65836258    0.8286814 1.0097060
## 5           x            5                 0                1 1.83924381    1.9002084 1.9617810
## 6           x            6                 1                0 1.59865187    1.7239358 1.8563640
## 7           x            7                 0                0 1.14220600    1.5122383 1.9106976
## 8           x            8                 1                0 1.93809134    2.1804360 2.4438411
## 9           x            9                 0                0 1.46916870    1.9698641 2.5084195
## 10          x           10                 0                0 1.63340341    2.1990871 2.8072804
christophergandrud commented 7 years ago

The main issue seems to come from here: https://github.com/IQSS/Zelig/blob/master/R/plots.R#L615-L619

if (is.null(var)) {
        each.var <- apply(xmatrix,2,sd)
        flag <- each.var > 0
        min.var <- min(each.var[flag])
        var.seq <- 1:ncol(xmatrix)
        position <- var.seq[each.var==min.var]
}

Basically, the x-axis variable is chosen as being the one with fitted values with the lowest standard deviation.

Wouldn't it make more sense to choose the x-axis variable as the one with the number of unique fitted values equaling the number of scenarios? E.g. in the example above, Petal.Length has 10 set values. This is also the number of scenarios.

christophergandrud commented 7 years ago

Another odd thing I just noticed. Why does the fitted values for the factor levels change across scenarios? E.g.:

##    setx_value Petal.Length Speciesversicolor Speciesvirginica  qi_ci_min qi_ci_median qi_ci_max
## 1           x            1                 0                0 0.07927395    0.1403430 0.2013641
## 2           x            2                 1                0 0.64081014    0.8089075 0.9724328
## 3           x            3                 1                0 0.93372904    1.0362871 1.1392268
## 4           x            4                 0                0 0.65836258    0.8286814 1.0097060
## 5           x            5                 0                1 1.83924381    1.9002084 1.9617810
## 6           x            6                 1                0 1.59865187    1.7239358 1.8563640
## 7           x            7                 0                0 1.14220600    1.5122383 1.9106976
## 8           x            8                 1                0 1.93809134    2.1804360 2.4438411
## 9           x            9                 0                0 1.46916870    1.9698641 2.5084195
## 10          x           10                 0                0 1.63340341    2.1990871 2.8072804
christophergandrud commented 7 years ago

The latter issue seems to be because Mode randomly chooses a value if there is no mode: https://github.com/IQSS/Zelig/blob/master/R/utils.R#L12-L13

Should we use a different heuristic? Maybe just the first value?

Related to this: maybe setx should provide some messages about how it is choosing fitted values when they aren't provided by users. I know you can find this out with summary, but an automatic message might be more intuitive.

christophergandrud commented 7 years ago

If everyone is ok with these changes, I'll implement them today.

christophergandrud commented 7 years ago

Issues resolved in 861482dfd66c55b2dc06efc1a90f19fc91dc5e97

E.g.:

library(Zelig)

z.out <- zelig(Petal.Width ~ Petal.Length + Species, data = iris, 
                       model = "ls", cite = FALSE)
x.out1 <- setx(z.out, Petal.Length = 1:10)
summary(x.out1)

## range:
##    (Intercept) Petal.Length Speciesversicolor Speciesvirginica
## 1            1            1                 0                1
## 2            1            2                 0                1
## 3            1            3                 0                1
## 4            1            4                 0                1
## 5            1            5                 0                1
## 6            1            6                 0                1
## 7            1            7                 0                1
## 8            1            8                 0                1
## 9            1            9                 0                1
## 10           1           10                 0                1

sims1 <- sim(z.out, x.out1)
plot(sims1)

image