linear regression example not giving reasonable results

mboedigh commented 7 years ago

Sorry I'm posting here, but I did see others post other requests for help here. Let me know if that is bad form. I'm trying to learn Mamba, starting with my experience from Stan. To get started I used the linear regression model at the start of the tutorial. I made minimal modifications towards a more general model. These are shown in snippets below


sigma_w = 1;
N = 100;
x = (1:N);
line = Dict{Symbol, Any}(
  :x => 1:N,
  :y => randn(N)*sigma_w + x
)
line[:xmat] = [ones(length(line[:y]),1) line[:x] ];

The model was modified slightly from the example (again a step toward generalization):

model = Model(
  y = Stochastic(1,
    (mu, sigma_w) ->  MvNormal(mu, sigma_w),
    false
  ),
  mu = Logical(1,
    (xmat, beta) -> xmat * beta,
    false
  ),
  beta = Stochastic(1,
    (xmat) -> MvNormal(fill(0.0,size(xmat,2)), 5)
  ),
  sigma_w = Stochastic(
    () -> InverseGamma(0.001, 0.001)
  )
)
scheme = [NUTS([:beta, :sigma_w])]
setsamplers!(model, scheme);

sim_nuts = mcmc(Line.model, line, inits, 10000, burnin=2500, thin=1, chains=nchains)
describe(sim_nuts)

the results show estimates for beta of about [-0.5 and 0.5], which is not so good given the least-squares answer of :

line[:xmat]\line[:y]
2-element Array{Float64,1}:
 0.263816
 0.994873

Can anyone suggest what is going wrong? I used a very similar model in Stan and it gave a reasonable answer.

bdeonovic commented 7 years ago

Seems like NUTS is getting stuck. Here is a reproducible script:

using Mamba
srand(123) ## set seed for reproducibility

sigma_w = 1;
N = 100;
x = (1:N);
line = Dict{Symbol, Any}(
  :x => 1:N,
  :y => randn(N)*sigma_w + x
)
line[:xmat] = [ones(length(line[:y]),1) line[:x] ];

model = Model(
              y = Stochastic(1, (mu, sigma_w) ->  MvNormal(mu, sigma_w), false),
              mu = Logical(1, (xmat, beta) -> xmat * beta, false),
              beta = Stochastic(1, (xmat) -> MvNormal(fill(0.0,size(xmat,2)), 5)),
              sigma_w = Stochastic(() -> InverseGamma(0.001, 0.001))
             )
scheme = [NUTS([:beta, :sigma_w])]
setsamplers!(model, scheme);

## Initial Values
inits = [
         Dict{Symbol, Any}(
                           :y => line[:y],
                           :beta => rand(Normal(0, 1), 2),
                           :sigma_w => rand(Gamma(1, 1))
                          )
         for i in 1:3
        ]

nchains = length(inits)

sim_nuts = mcmc(model, line, inits, 10000, burnin=2500, thin=1, chains=nchains)
describe(sim_nuts)

If we look at the MCMC values we see

julia> sim_nuts.value[1:10,:,:]
10×3×3 Array{Float64,3}:       
[:, :, 1] =                    
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  
 1.18304  0.376264  -0.405272  

[:, :, 2] =                    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    
 0.305749  1.33585  1.60076    

[:, :, 3] =                    
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589  
 0.610806  -1.45789  0.800589

I am guessing you were getting [-0.5, 0.5] because you were initializing beta to start at [0, 0].

Will look into the code. However, I am pretty sure this is mostly due to the nature of the example.

julia> X = line[:xmat]; y=line[:y];

julia> det(X'X)
8.3325e6

We see that the determinant of the X matrix is very large meaning this problem is ill-conditioned.

mboedigh commented 7 years ago

Brain Thanks for the reply. I don’t know Julia well, but the problem I’m trying to create is not ill-conditioned. Line[:xmat] should be full column rank of 2. It solves well using:

julia> line[:xmat]\line[:y] 2-element Array{Float64,1}: 0.0733763 0.997099

I think the problem is not the data. I see that NUTS is getting stuck, I don’t know why. Stan solves this using it’s version of NUTS. If it is helpful, I can show you the Stan version I used.

I like the way Mamba is designed, and believe I will enjoy working with it regularly. Thanks.

Best regards Mike

From: Benjamin Deonovic [mailto:notifications@github.com] Sent: Monday, July 31, 2017 6:33 AM To: brian-j-smith/Mamba.jl Mamba.jl@noreply.github.com Cc: Boedigheimer, Michael mboedigh@amgen.com; Author author@noreply.github.com Subject: Re: [brian-j-smith/Mamba.jl] linear regression example not giving reasonable results (#120)

Seems like NUTS is getting stuck. Here is a reproducible script:

using Mamba

srand(123) ## set seed for reproducibility

sigma_w = 1;

N = 100;

x = (1:N);

line = Dict{Symbol, Any}(

:x => 1:N,

:y => randn(N)*sigma_w + x

)

line[:xmat] = [ones(length(line[:y]),1) line[:x] ];

model = Model(

          y = Stochastic(1, (mu, sigma_w) ->  MvNormal(mu, sigma_w), false),

          mu = Logical(1, (xmat, beta) -> xmat * beta, false),

          beta = Stochastic(1, (xmat) -> MvNormal(fill(0.0,size(xmat,2)), 5)),

          sigma_w = Stochastic(() -> InverseGamma(0.001, 0.001))

         )

scheme = [NUTS([:beta, :sigma_w])]

setsamplers!(model, scheme);

Initial Values

inits = [

     Dict{Symbol, Any}(

                       :y => line[:y],

                       :beta => rand(Normal(0, 1), 2),

                       :sigma_w => rand(Gamma(1, 1))

                      )

     for i in 1:3

    ]

nchains = length(inits)

sim_nuts = mcmc(model, line, inits, 10000, burnin=2500, thin=1, chains=nchains)

describe(sim_nuts)

If we look at the MCMC values we see

julia> sim_nuts.value[1:10,:,:]

10×3×3 Array{Float64,3}:

[:, :, 1] =

1.18304 0.376264 -0.405272

[:, :, 2] =

0.305749 1.33585 1.60076

[:, :, 3] =

0.610806 -1.45789 0.800589

Will look into the code. However, I am pretty sure this is mostly due to the nature of the example.

julia> X = line[:xmat]; y=line[:y];

julia> det(X'X)

8.3325e6

We see that the condition number of the X matrix is very large meaning this problem is ill-conditioned.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/brian-j-smith/Mamba.jl/issues/120#issuecomment-319068059, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI8-D2l1npuKZ6Gxfc6KmoxVEYJObmq-ks5sTdeEgaJpZM4OglGD.

bdeonovic commented 7 years ago

Changing initial values to something closer to the truth results in correct output:

## Initial Values
inits = [
         Dict{Symbol, Any}(:y => line[:y], :beta => [0.0, 1.0], :sigma_w => 1.0),
         Dict{Symbol, Any}(:y => line[:y], :beta => line[:xmat]\line[:y], :sigma_w => 1.0)
        ]

nchains = length(inits)

sim_nuts = mcmc(model, line, inits, 10000, burnin=2500, thin=1, chains=nchains)
describe(sim_nuts)

julia> describe(sim_nuts)
Iterations = 2501:10000
Thinning interval = 1
Chains = 1,2
Samples per chain = 7500

Empirical Posterior Estimates:
            Mean         SD         Naive SE         MCSE         ESS
sigma_w  1.14096298 0.0820408647 0.000669860855 0.001462307980 3147.6234
beta[1] -0.09545879 0.2268879348 0.001852532230 0.005461201581 1726.0222
beta[2]  1.00263843 0.0039072747 0.000031902764 0.000084420935 2142.1386

Quantiles:
            2.5%       25.0%       50.0%       75.0%       97.5%
sigma_w  0.99623051  1.08391122  1.13599791 1.191574016 1.31777059
beta[1] -0.54788610 -0.24284341 -0.09442113 0.057914062 0.34095642
beta[2]  0.99508585  0.99999733  1.00263428 1.005231688 1.01037139

After some digging it looks like the NUTS epsilon value is not getting set correctly in some cases. I will investigate further.

mboedigh commented 7 years ago

Thanks, that make sense. Let me know if you figure out the potential issue with epsilon initialization during warmup. Best wishes Mike

From: Benjamin Deonovic [mailto:notifications@github.com] Sent: Monday, July 31, 2017 12:06 PM To: brian-j-smith/Mamba.jl Mamba.jl@noreply.github.com Cc: Boedigheimer, Michael mboedigh@amgen.com; Author author@noreply.github.com Subject: Re: [brian-j-smith/Mamba.jl] linear regression example not giving reasonable results (#120)

Changing initial values to something closer to the truth results in correct output:

Initial Values

inits = [

     Dict{Symbol, Any}(:y => line[:y], :beta => [0.0, 1.0], :sigma_w => 1.0),

     Dict{Symbol, Any}(:y => line[:y], :beta => line[:xmat]\line[:y], :sigma_w => 1.0)

    ]

nchains = length(inits)

sim_nuts = mcmc(model, line, inits, 10000, burnin=2500, thin=1, chains=nchains)

describe(sim_nuts)

julia> describe(sim_nuts)

Iterations = 2501:10000

Thinning interval = 1

Chains = 1,2

Samples per chain = 7500

Empirical Posterior Estimates:

        Mean         SD         Naive SE         MCSE         ESS

sigma_w 1.14096298 0.0820408647 0.000669860855 0.001462307980 3147.6234

beta[1] -0.09545879 0.2268879348 0.001852532230 0.005461201581 1726.0222

beta[2] 1.00263843 0.0039072747 0.000031902764 0.000084420935 2142.1386

Quantiles:

        2.5%       25.0%       50.0%       75.0%       97.5%

sigma_w 0.99623051 1.08391122 1.13599791 1.191574016 1.31777059

beta[1] -0.54788610 -0.24284341 -0.09442113 0.057914062 0.34095642

beta[2] 0.99508585 0.99999733 1.00263428 1.005231688 1.01037139

After some digging it looks like the NUTS epsilon value is not getting set correctly in some cases. I will investigate further.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/brian-j-smith/Mamba.jl/issues/120#issuecomment-319160783, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI8-D1-VwETbTSydQdLY92L2GshI3dMCks5sTiFkgaJpZM4OglGD.

brian-j-smith / Mamba.jl

linear regression example not giving reasonable results #120

Initial Values

Initial Values