brian-j-smith / Mamba.jl

Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia
Other
254 stars 52 forks source link

A quick test on Rasch model #4

Closed cwliu007 closed 10 years ago

cwliu007 commented 10 years ago

Hi Brian,

1) According to the lsat example, I wrote a Rasch model (a model revelant to the model lsat example used, but not the same) in Mamba's script below in the last. After a serial of trial and error (I will explain later), I made it a go. However, the results are not right! I know it is not right because I simulated the data set and have estimated the parameters by using an R package called `ltm' whose results are very close to the "true" parameter I set up. Additionally, the results were also confirmed by using Rstan. But I am not sure if I wrote the Mamba script mistakenly or it is an issue in Mamba itself, here I think I should tell you about this. I also tried other samplers and long update, but seems not working.

See introduction of Rasch model just in case if need: http://en.wikipedia.org/wiki/Rasch_model.

The true parameters for delta range from -2 to 2 in step of 2, like linspace(-2,2,10) in Julia. alphas are all 1.

2) Inside the script, the d = Logical(1, part is not necessary and preserved just for testing. However, I found the results of delta are not averagely equal to 0. No idea why the average of delta is not zero. Another question (not for delta), can it be done for a matrix, constraints for each column respectively.

3) At first use, I wrote the delta as simply d, and I use both d in Stochastic and Logical part. Mamba seems not accepting it and return unknown error. Maybe in the manual the naming rule should be noted clearly.

Here comes my feelings about Mamba: 4) The Model part is not very straightforward to quickly write for the first use. For example, I have been confused why we can use the theta inside theta=Stochastic part? Because within quote...end? A metaprogramming (but, I am not familiar with it)? And what does it mean when we write in this way? What is the logic? Maybe I am too used to the styles that JAGS (just write down the likelihood function in Model part!!). Overall, I personally think it is not very friendly to use. But sad I am not an expert here, and cannot give you any suggestions. Just a quick experience with Mamba.

Script here! Please download the data set from http://www.fileswap.com/dl/xlYHvEv6Fr/. Sorry to bother, because I cannot find your email.

Oh! The model below is an extension of Rasch model. Ignoring the alpha parameters, the model below is equivalent to Rasch model.! Just forget to delete alpha here. But, they all should be estimated.

RES = readdlm("res.txt") # please download from http://www.fileswap.com/dl/xlYHvEv6Fr/
people, itemnum = size(res)

## A data dictionary
data = (Symbol => Any)[
  :RES => RES, # right RES comes from above responses simulation
  :itemnum => itemnum,
  :point => 2,
  :C => 1,
  :z => [0;1],
  :people => people]

chains = 1

# # initial values: A vector (n elements) consisting of n dictionary
inits =
  [
  [:RES => data[:RES], # must put in data here
          :theta => zeros(data[:people]),
          :alpha => abs(randn(data[:itemnum])),
          :delta => rand(data[:itemnum]) ]
    ]

# # co1nstruct model
model = Model(
  RES =Stochastic(2,
            @modelexpr(theta, alpha, delta, people, itemnum, point, z,
                    Distribution[
                         begin
                          nu = ones(point)
                          for k = 2:point
                            nu[k] = exp(alpha[i].*(z[k].*(theta[j]-delta[i])  )) 
                          end
                          de = sum(nu)
                          p = nu ./ de 
                         Categorical(p)
                       end
                        for j = 1:people, i = 1:itemnum ] # end of Distribution
                    )  # end of @modelexpr
    ,true
    ), # end of RES
  alpha = Stochastic(1,
      :(Uniform(0, 10))
    ,true
    ), # end of alpha
  delta = Stochastic(1,
      :(Normal(0, 10))
    ,true
    ), # end of delta
  d = Logical(1,
     @modelexpr(delta,
                delta - mean(delta)
                ) # end of @modelexpr
   ), # end of Logical d
  theta = Stochastic(1,
      :(Normal(0, 1))
    ,true
    ) # end of theta
  ) # end of Model

# # # Choose Samplers
scheme1 = [Slice([:alpha], fill(1.0, data[:itemnum])), Slice([:delta], fill(1.0, data[:itemnum])),
           Slice([:theta], fill(1.0, data[:people]))] 

# # # Set samplers
setsamplers!(model, scheme1)

# # # Start sampling
sim1 = mcmc(model, data, inits, 600, burnin=200, thin=2, chains=chains)

retrive = ["delta[$i]" for i in 1:itemnum]
out = sim1[:,retrive,:]
describe(out)
brian-j-smith commented 10 years ago

Mamba's results for the Rasch model (as defined in the Wikipedia page and the ltm R package documentation) are consistent with those from JAGS. See comparisons below.

Classical Parametrization

Mamba

Code

RES = readdlm("res.txt")
people, itemnum = size(RES)

## A data dictionary
data = (Symbol => Any)[
  :RES => RES,
  :itemnum => itemnum,
  :point => 2,
  :C => 1,
  :z => [0;1],
  :people => people]

chains = 3

# # initial values: A vector (n elements) consisting of n dictionary
inits = Dict{Symbol,Any}[
  [:RES => data[:RES],
   :theta => zeros(data[:people]),
   :alpha => abs(randn()),
   :delta => rand(data[:itemnum])]
  for i in 1:chains]

# # construct model
model = Model(

  RES = Stochastic(2,
          @modelexpr(theta, delta, people, itemnum,
            Distribution[
              begin
                p = invlogit(theta[j] - delta[i])
                Categorical([1 - p, p])
              end
              for j = 1:people, i = 1:itemnum
            ]
          ),
          false
        ),

  delta = Stochastic(1,
            :(Normal(0, 10))
          ),

  theta = Stochastic(1,
            :(Normal(0, 1)),
            false
          )

)

# # # Choose Samplers
scheme1 = [Slice([:delta], fill(1.0, data[:itemnum])),
           Slice([:theta], fill(1.0, data[:people]))]

# # # Set samplers
setsamplers!(model, scheme1)

# # # Start sampling
sim1 = mcmc(model, data, inits, 11000, burnin=1000, thin=2, chains=chains)
describe(sim1)

Results

Iterations = 1002:11000
Thinning interval = 2
Chains = 1,2,3
Samples per chain = 5000

Empirical Posterior Estimates:
11x6 Array{Any,2}:
 ""             "Mean"    "SD"      "Naive SE"   "MCSE"         "ESS"
 "delta[1]"   -2.36876   0.501095  0.00409142   0.0136901   4482.91
 "delta[2]"   -1.5289    0.396163  0.00323465   0.00953147  5090.49
 "delta[3]"   -1.14127   0.377432  0.00308172   0.00880372  5250.71
 "delta[4]"   -0.700101  0.352429  0.00287757   0.00845189  5106.98
 "delta[5]"    0.104506  0.341061  0.00278476   0.00797456  5238.07
 "delta[6]"    0.300763  0.341886  0.00279149   0.00779137  5374.19
 "delta[7]"    1.02646   0.362721  0.00296161   0.00913744  4861.76
 "delta[8]"    1.39177   0.390375  0.0031874    0.00881855  5421.64
 "delta[9]"    1.39241   0.38263   0.00312416   0.00787714  5949.16
 "delta[10]"   2.39328   0.481939  0.00393502   0.0120364   4903.89

Quantiles:
11x6 Array{Any,2}:
 ""             "2.5%"     "25.0%"     "50.0%"    "75.0%"    "97.5%"
 "delta[1]"   -3.43836   -2.68509    -2.34409   -2.02045   -1.46896
 "delta[2]"   -2.32807   -1.78598    -1.5213    -1.25728   -0.764239
 "delta[3]"   -1.87813   -1.38866    -1.13533   -0.886678  -0.415053
 "delta[4]"   -1.39712   -0.936303   -0.696955  -0.459057  -0.0275823
 "delta[5]"   -0.571883  -0.123607    0.103981   0.335361   0.771964
 "delta[6]"   -0.376678   0.0732802   0.300406   0.531212   0.97529
 "delta[7]"    0.328957   0.778934    1.02131    1.26588    1.75122
 "delta[8]"    0.636813   1.1259      1.39047    1.6503     2.16883
 "delta[9]"    0.631384   1.13732     1.39059    1.64732    2.14126
 "delta[10]"   1.51803    2.06071     2.36765    2.70424    3.3984

JAGS

Code

model {

  for(i in 1:50) {
    for(j in 1:10) {
      p[i,j] <- ilogit(theta[i] - delta[j])
      prob[i,j,1] <- 1 - p[i,j]
      prob[i,j,2] <- p[i,j]
      RES[i,j] ~ dcat(prob[i,j,])
    }
  }

  for(j in 1:10) {
    delta[j] ~ dnorm(0, 0.01)
  }

  for(i in 1:50) {
    theta[i] ~ dnorm(0, 1)
  }

}

Results

Iterations = 1002:11000
Thinning interval = 2 
Number of chains = 3 
Sample size per chain = 5000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

             Mean     SD Naive SE Time-series SE
delta[1]  -2.3767 0.4894 0.003996       0.004602
delta[2]  -1.5285 0.4006 0.003271       0.004170
delta[3]  -1.1399 0.3778 0.003085       0.004041
delta[4]  -0.6913 0.3512 0.002868       0.003916
delta[5]   0.1003 0.3386 0.002765       0.003660
delta[6]   0.2960 0.3452 0.002818       0.003755
delta[7]   1.0354 0.3651 0.002981       0.003878
delta[8]   1.4017 0.3936 0.003214       0.004124
delta[9]   1.3975 0.3896 0.003181       0.004135
delta[10]  2.3853 0.4939 0.004033       0.004640

2. Quantiles for each variable:

             2.5%      25%      50%     75%    97.5%
delta[1]  -3.3922 -2.69367 -2.35982 -2.0384 -1.47578
delta[2]  -2.3403 -1.78990 -1.52023 -1.2567 -0.77073
delta[3]  -1.9022 -1.39020 -1.13663 -0.8798 -0.42476
delta[4]  -1.3860 -0.92679 -0.68959 -0.4532 -0.01466
delta[5]  -0.5769 -0.12618  0.09974  0.3289  0.75977
delta[6]  -0.3799  0.06659  0.29358  0.5271  0.97410
delta[7]   0.3252  0.78756  1.02985  1.2800  1.76700
delta[8]   0.6604  1.13250  1.39128  1.6619  2.18829
delta[9]   0.6526  1.13394  1.38438  1.6529  2.18805
delta[10]  1.4653  2.04785  2.36419  2.7055  3.39592

IRT Parametrization

Note that alpha, theta, and delta are not uniquely identifiable in the likelihood for this parametrization; i.e., given any set of values for those parameters, c * alpha, theta / c, and delta / c, where c is any scalar, gives the same likelihood value. Identifiability comes from the priors. Thus, depending on the choice of priors, posterior estimates may or may not be consistent with the values used to simulate the data. The composite probability p is identifiable in the likelihood.

Mamba

# # construct model
model = Model(

  RES = Stochastic(2,
          @modelexpr(alpha, theta, delta, people, itemnum,
            Distribution[
              begin
                p = invlogit(alpha * (theta[j] - delta[i]))
                Categorical([1 - p, p])
              end
              for j = 1:people, i = 1:itemnum
            ]
          ),
          false
        ),

  alpha = Stochastic(
            :(Uniform(0, 10))
          ),

  delta = Stochastic(1,
            :(Normal(0, 10))
          ),

  theta = Stochastic(1,
            :(Normal(0, 1)),
            false
          )

)

# # # Choose Samplers
scheme1 = [Slice([:alpha], [1.0]),
           Slice([:delta], fill(1.0, data[:itemnum])),
           Slice([:theta], fill(1.0, data[:people]))]

Results

Iterations = 1002:11000
Thinning interval = 2
Chains = 1,2,3
Samples per chain = 5000

Empirical Posterior Estimates:
12x6 Array{Any,2}:
 ""              "Mean"    "SD"      "Naive SE"   "MCSE"         "ESS"
 "delta[1]"   -10.071     4.72646   0.0385914    0.202627    2856.84
 "delta[2]"    -6.8526    3.75743   0.0306793    0.177871    2587.21
 "delta[3]"    -5.03347   2.87318   0.0234594    0.113126    3110.6
 "delta[4]"    -3.05824   2.18216   0.0178172    0.0963987   2772.43
 "delta[5]"     0.376126  1.64503   0.0134316    0.057892    3480.17
 "delta[6]"     1.42644   1.83966   0.0150207    0.0706917   3187.24
 "delta[7]"     4.50767   2.61753   0.021372     0.116421    2753.63
 "delta[8]"     6.10583   3.18698   0.0260216    0.140568    2776.75
 "delta[9]"     5.98729   3.24141   0.026466     0.146045    2718.27
 "delta[10]"   10.265     4.66644   0.0381014    0.220896    2587.28
 "alpha"        0.245565  0.153358  0.00125216   0.00814715  2305.4

Quantiles:
12x6 Array{Any,2}:
 ""              "2.5%"       "25.0%"    "50.0%"    "75.0%"    "97.5%"
 "delta[1]"   -20.9111     -12.9792    -9.51878   -6.42177   -3.06046
 "delta[2]"   -16.2543      -8.91476   -6.27715   -3.98606   -1.7269
 "delta[3]"   -11.8642      -6.75847   -4.47639   -2.82186   -1.08357
 "delta[4]"    -8.5313      -4.26672   -2.62158   -1.45426    0.00998723
 "delta[5]"    -3.0639      -0.521997   0.328202   1.26401    3.91983
 "delta[6]"    -1.74056      0.273591   1.1458     2.45209    5.58317
 "delta[7]"     0.784502     2.44709    4.10545    6.11024   10.6612
 "delta[8]"     1.49777      3.50621    5.69972    8.22152   13.1567
 "delta[9]"     1.57993      3.41186    5.50294    7.82984   14.0761
 "delta[10]"    3.11491      6.49361   10.0802    13.5985    19.4249
 "alpha"        0.0928291    0.139705   0.187951   0.300781   0.665054

JAGS

Model

model {

  for(i in 1:50) {
    for(j in 1:10) {
      p[i,j] <- ilogit(alpha * (theta[i] - delta[j]))
      prob[i,j,1] <- 1 - p[i,j]
      prob[i,j,2] <- p[i,j]
      RES[i,j] ~ dcat(prob[i,j,])
    }
  }

  alpha ~ dunif(0, 10)

  for(j in 1:10) {
    delta[j] ~ dnorm(0, 0.01)
  }

  for(i in 1:50) {
    theta[i] ~ dnorm(0, 1)
  }

}

Results

Iterations = 1002:11000
Thinning interval = 2 
Number of chains = 3 
Sample size per chain = 5000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

              Mean     SD Naive SE Time-series SE
alpha       0.2293 0.1663 0.001358        0.01333
delta[1]  -11.2953 5.1934 0.042404        0.26320
delta[2]   -7.3468 3.7278 0.030438        0.17535
delta[3]   -5.5002 3.0548 0.024942        0.12367
delta[4]   -3.3735 2.3533 0.019215        0.07378
delta[5]    0.4524 1.8484 0.015092        0.01710
delta[6]    1.4039 1.9274 0.015737        0.03252
delta[7]    4.9500 2.8462 0.023239        0.11640
delta[8]    6.6778 3.4938 0.028527        0.15737
delta[9]    6.6618 3.4673 0.028310        0.14971
delta[10]  11.2441 5.1340 0.041919        0.25536

2. Quantiles for each variable:

               2.5%      25%      50%     75%     97.5%
alpha       0.08628   0.1272   0.1678  0.2547  0.739847
delta[1]  -22.26484 -14.7251 -11.0654 -7.3997 -2.825435
delta[2]  -15.68969  -9.6366  -7.0346 -4.4935 -1.686619
delta[3]  -12.48723  -7.3535  -5.0872 -3.1171 -1.100239
delta[4]   -8.94049  -4.7154  -2.9868 -1.6091 -0.002983
delta[5]   -3.27275  -0.5441   0.3552  1.4224  4.498693
delta[6]   -1.99642   0.2072   1.1306  2.4310  5.919822
delta[7]    0.87206   2.7748   4.5434  6.6327 11.409588
delta[8]    1.51239   3.9504   6.2858  8.8746 14.374147
delta[9]    1.46459   4.0173   6.3293  8.8118 14.295844
delta[10]   2.79915   7.3708  11.0674 14.5969 22.028411

As for the interfaces... Differences between Mamba and JAGS are touched on in the Introduction section of the documentation. JAGS provides a DSL (domain specific language) for model specification; Mamba uses native julia code. Advantages of a DSL include more concise syntax and facilitation of automatic sampler selection. Disadvantages include considerably more work to implement and extend, and less flexible model specification.

cwliu007 commented 10 years ago

Thanks for your clear example of code, Brian. That helps me to understand your package more :+1:

brian-j-smith commented 10 years ago

No problem. Best of luck with your project.