Add africa-first-candidate

StatisticalRethinkingJulia / TuringModels.jl

Implementations of the models from the Statistical Rethinking book with Turing.jl

https://statisticalrethinkingjulia.github.io/TuringModels.jl/

MIT License

163 stars 18 forks source link

Add africa-first-candidate #68

Closed rikhuijzer closed 3 years ago

rikhuijzer commented 3 years ago

Changes:

Fix m8.1 for edition 2. The previous africa model didn't actually correspond to the book.
Uses Gadfly for plotting of the new model, which works around https://github.com/TuringLang/MCMCChains.jl/issues/281.
Fix the styling of tables, the previous CSS was quite poor; now it looks more like Pluto.jl.

goedman commented 3 years ago

Hi Rik,

Looks great!

I don't have the actual book with me ( we spend twice a year a couple of month in Washington, DC and the rest of the year in Colorado), but in the iBooks version:

“m8.1 <- quap(
 alist(
  log_gdp_std ~ dnorm( mu , sigma ) ,
  mu <- a + b*( rugged_std - 0.215 ) ,
  a ~ dnorm( 1 , 0.1 ) ,
  b ~ dnorm( 0 , 0.3 ) ,
  sigma ~ dexp(1)
 ) , data=dd )”

Excerpt From: Richard McElreath. “Statistical Rethinking.” Apple Books. https://books.apple.com/us/book/statistical-rethinking/id1544087161

The sigma result also surprises me a bit (0.14 in Stan/SR vs ~6 in Turing?).

Gr., Rob

rikhuijzer commented 3 years ago

Thanks, Rob!

we spend twice a year a couple of month in Washington, DC and the rest of the year in Colorado

Unrelated, but that sounds very nice.

The sigma result also surprises me a bit (0.14 in Stan/SR vs ~6 in Turing?).

You're right. I also saw that but did not have time to investigate it further. It might be that I calculate the mean wrong or that the prior for σ is wrong. Do you have an intuition on what it could be here?

goedman commented 3 years ago

Maybe we're not using the same data, my first 8 rows look like:

8×5 DataFrame
        Row │ log_gdp  rugged   cont_africa  log_gdp_std  rugged_std
            │ Float64  Float64  Int64        Float64      Float64
          1 │ 7.49261    0.858            1     1.00276    0.138342
          2 │ 6.43238    1.78             1     0.860869   0.287004
          3 │ 6.86612    0.141            1     0.918918   0.0227346
          4 │ 6.90617    0.236            1     0.924278   0.0380522
          5 │ 8.9493     0.181            1     1.19772    0.0291841
          6 │ 7.04582    0.197            1     0.942968   0.0317639
          7 │ 7.36241    0.224            1     0.985338   0.0361174
          8 │ 7.54046    0.515            1     1.00917    0.0830377

generated by:

begin
    df = CSV.read(sr_datadir("rugged.csv"), DataFrame)
    dropmissing!(df, :rgdppc_2000)
    dropmissing!(df, :rugged)
    df.log_gdp = log.(df[:, :rgdppc_2000])
    df.log_gdp_s = df.log_gdp / mean(df.log_gdp)
    df.rugged_s = df.rugged / maximum(df.rugged)
    df.cid = [df.cont_africa[i] == 1 ? 1 : 2 for i in 1:size(df, 1)]
    data = (N = size(df, 1), K = length(unique(df.cid)), 
        G = df.log_gdp_s, R = df.rugged_s, cid=df.cid)
    df[1:8, [:log_gdp, :rugged, :cont_africa, :log_gdp_s, :rugged_s]]
end

rikhuijzer commented 3 years ago

Interestingly enough, the book has two definitions for m8.1!!! Also, I can't exactly get 0.215. I've now used mean(df.rugged_std) which evaluates to 0.15.

After using the priors from both model definitions from the book and hardcoding 0.215, I still get σ ≈ 6. For now, I give up on this one. I've only pushed a small change to mention this and improve the mean.

EDIT: The webpage shows exactly the same table, so I think that our calculations are the same.