avehtari / ROS-Examples

Regression and other stories R examples
https://avehtari.github.io/ROS-Examples/
324 stars 256 forks source link

Weird output in section 6.3 #127

Closed rosgori closed 11 months ago

rosgori commented 1 year ago

According to the book, this code (from section 6.3) outputs a result that you can verify

library(rstanarm)
library(rosdata)
data("earnings")
earnings$earnk <- earnings$earn/1000
fit_2 <- stan_glm(earnk ~ height + male, data=earnings)

Unfortunately, in my case, the output is not consistent.

Output 01 ```R stan_glm family: gaussian [identity] formula: earnk ~ height + male observations: 1816 predictors: 3 ------ Median MAD_SD (Intercept) 37.7 980.5 height -0.3 14.6 male -3.7 117.8 Auxiliary parameter(s): Median MAD_SD sigma 4874866.8 6113.7 ```
Output 02 ```R stan_glm family: gaussian [identity] formula: earnk ~ height + male observations: 1816 predictors: 3 ------ Median MAD_SD (Intercept) -80.4 11.9 height 1.5 0.2 male 0.8 1.5 Auxiliary parameter(s): Median MAD_SD sigma 21.7 0.4 ```
Output 03 ```R Chain 1: Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity. Chain 1: Stan can't start sampling from this initial value. Chain 1: Chain 1: Initialization between (-2, 2) failed after 100 attempts. Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model. [1] "Error : Initialization failed." error occurred during calling the sampler; sampling not done Error in check_stanfit(stanfit) : Invalid stanfit object produced please report bug ```
Output 04 ```R stan_glm family: gaussian [identity] formula: earnk ~ height + male observations: 1816 predictors: 3 ------ Median MAD_SD (Intercept) 41.3 946.4 height -0.5 14.0 male 1.9 116.7 Auxiliary parameter(s): Median MAD_SD sigma 4874823.8 6312.3 ```

You can find the same model using brms in this link. I even tried these lines (with SEED <- 7783)

https://github.com/avehtari/ROS-Examples/blob/a049a1044aef17bb90664d3b4735fc7ad6ff5575/Earnings/earnings_regression.R#L90-L94

but my session crashed.

So if you are in the same position, then the question is: How can you get the same result (or very very similar) every time you execute fit_2 <- stan_glm(earnk ~ height + male, data=earnings)? Well, you use the option offset from stan_glm, like this:

offset_vec <- replicate(nrow(earnings), 0.5)
fit_2 <- stan_glm(earnk ~ height + male, data=earnings, offset=offset_vec)
print(fit_2)

And the result is


stan_glm
 family:       gaussian [identity]
 formula:      earnk ~ height + male
 observations: 1816
 predictors:   3
------
            Median MAD_SD
(Intercept) -26.7   12.0 
height        0.7    0.2 
male         10.6    1.4 

Auxiliary parameter(s):
      Median MAD_SD
sigma 21.4    0.3

Bingo! I suggest to play with the number inside replicate to see the different values for (Intercept).

andrewgelman commented 1 year ago

I don't know what is library(rosdata) but if I do it directly I have no problem:

library("rstanarm") earnings <- read.csv("https://raw.githubusercontent.com/avehtari/ROS-Examples/master/Earnings/data/earnings.csv") earnings$earnk <- earnings$earn/1000 fit_2 <- stan_glm(earnk ~ height + male, data=earnings) print(fit_2)

You can run this over and over and you get the same result except for some Monte Carlo variability in the last decimal place. Maybe you need to reload rstanerm?

avehtari commented 11 months ago

I also get the same result every time with

library(rstanarm)
library(rosdata)
data("earnings")
earnings$earnk <- earnings$earn/1000
fit_2 <- stan_glm(earnk ~ height + male, data=earnings)