TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2k stars 217 forks source link

Linear regression tutorial was not reproducible #1313

Closed ValentinKaisermayer closed 4 years ago

ValentinKaisermayer commented 4 years ago

I copied the code but did not get the correct results:

julia> describe(chain)
2-element Array{ChainDataFrame,1}

Summary Statistics
        parameters     mean     std  naive_se    mcse        ess   r_hat
  ────────────────  ───────  ──────  ────────  ──────  ─────────  ──────
   coefficients[1]   1.0000  0.0000    0.0000  0.0000  2016.6363  0.9998
   coefficients[2]   0.0000  0.0000    0.0000  0.0000  2089.0093  0.9997
   coefficients[3]   0.0000  0.0000    0.0000  0.0000  1703.9638  1.0013
   coefficients[4]  -0.0000  0.0000    0.0000  0.0000  2006.2867  1.0006
   coefficients[5]   0.0000  0.0000    0.0000  0.0000  1575.2603  0.9998
   coefficients[6]   0.0000  0.0000    0.0000  0.0000  1448.6048  1.0002
   coefficients[7]  -0.0000  0.0000    0.0000  0.0000  1925.6609  1.0006
   coefficients[8]  -0.0000  0.0000    0.0000  0.0000  2077.7749  1.0000
   coefficients[9]  -0.0000  0.0000    0.0000  0.0000  2040.8867  1.0010
  coefficients[10]   0.0000  0.0000    0.0000  0.0000  1595.8217  0.9999
  coefficients[11]  -0.0000  0.0000    0.0000  0.0000  1270.4007  1.0005
         intercept  -0.0000  0.0000    0.0000  0.0000  1761.9700  0.9999
                σ₂   0.0000  0.0000    0.0000  0.0000     8.0321  1.6490

Quantiles
        parameters     2.5%    25.0%    50.0%   75.0%   97.5%
  ────────────────  ───────  ───────  ───────  ──────  ──────
   coefficients[1]   1.0000   1.0000   1.0000  1.0000  1.0000
   coefficients[2]  -0.0000  -0.0000  -0.0000  0.0000  0.0000
   coefficients[3]  -0.0000  -0.0000   0.0000  0.0000  0.0000
   coefficients[4]  -0.0000  -0.0000  -0.0000  0.0000  0.0000
   coefficients[5]  -0.0000  -0.0000  -0.0000  0.0000  0.0000
   coefficients[6]  -0.0000  -0.0000   0.0000  0.0000  0.0000
   coefficients[7]  -0.0000  -0.0000   0.0000  0.0000  0.0000
   coefficients[8]  -0.0000  -0.0000  -0.0000  0.0000  0.0000
   coefficients[9]  -0.0000  -0.0000  -0.0000  0.0000  0.0000
  coefficients[10]  -0.0000  -0.0000   0.0000  0.0000  0.0000
  coefficients[11]  -0.0000  -0.0000  -0.0000  0.0000  0.0000
         intercept  -0.0000  -0.0000  -0.0000  0.0000  0.0000
                σ₂   0.0000   0.0000   0.0000  0.0000  0.0000

Julia: 1.4.2 Turing: 0.13.0

torkar commented 4 years ago

I can confirm this is the case (Julia 1.4.1 and Turing 0.13.0 on OS X).

devmotion commented 4 years ago

Thanks! There are some other problems with the linear regression tutorial currently as well. Among other things, the standardization is incorrect (see https://github.com/TuringLang/TuringTutorials/issues/73), it contains a data leakage, and the model (or its documentation) are incorrect. I'm currently updating the tutorial, and hopefully that will resolve the problems that you observed as well.

ValentinKaisermayer commented 4 years ago

Tested it with the updated tutorial code:

using Turing, Distributions
using RDatasets
using MCMCChains, Plots, StatsPlots
using MLDataUtils: shuffleobs, splitobs, rescale!
using Distances
using Random
Random.seed!(0)

Turing.turnprogress(false)

data = RDatasets.dataset("datasets", "mtcars")
select!(data, Not(:Model))
trainset, testset = splitobs(shuffleobs(data), 0.7)

target = :MPG
train = Matrix(select(trainset, Not(target)))
test = Matrix(select(testset, Not(target)))
train_target = trainset[:, target]
test_target = testset[:, target]

μ, σ = rescale!(train; obsdim = 1)
rescale!(test, μ, σ; obsdim = 1)
μtarget, σtarget = rescale!(train_target; obsdim = 1)
rescale!(test_target, μtarget, σtarget; obsdim = 1)

# Bayesian linear regression.
@model function linear_regression(x, y)
    σ₂ ~ truncated(Normal(0, 100), 0, Inf) # Set variance prior.
    intercept ~ Normal(0, sqrt(3)) # Set intercept prior.
    nfeatures = size(x, 2)  
    coefficients ~ MvNormal(nfeatures, sqrt(10)) # Set the priors on our coefficients.
    mu = intercept .+ x * coefficients # Calculate all the mu terms.
    y ~ MvNormal(mu, sqrt(σ₂))
end

model = linear_regression(train, train_target)
chain = sample(model, NUTS(0.65), 3_000)

plot(chain)
describe(chain)

Looks good but I get two warnings.

[ Info: [Turing]: progress logging is disabled globally
┌ Info: Found initial step size
└   ϵ = 1.6
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC 
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC
2-element Array{ChainDataFrame,1}

Summary Statistics
        parameters     mean     std  naive_se    mcse       ess   r_hat
  ────────────────  ───────  ──────  ────────  ──────  ────────  ──────
   coefficients[1]  -0.0413  0.5648    0.0126  0.0389  265.1907  1.0010
   coefficients[2]   0.2770  0.6994    0.0156  0.0401  375.2777  1.0067
   coefficients[3]  -0.4116  0.3850    0.0086  0.0160  695.3990  1.0032
   coefficients[4]   0.1805  0.2948    0.0066  0.0126  479.9290  1.0010
   coefficients[5]  -0.2669  0.7168    0.0160  0.0316  373.0291  1.0009
   coefficients[6]   0.0256  0.3461    0.0077  0.0119  571.0954  1.0028
   coefficients[7]   0.0277  0.3899    0.0087  0.0174  637.1596  1.0007
   coefficients[8]   0.1535  0.3050    0.0068  0.0117  579.1998  1.0032
   coefficients[9]   0.1223  0.2839    0.0063  0.0105  587.6752  0.9995
  coefficients[10]  -0.2839  0.3975    0.0089  0.0195  360.9612  1.0019
         intercept   0.0058  0.1179    0.0026  0.0044  580.0222  0.9995
                σ₂   0.3017  0.1955    0.0044  0.0132  227.2322  1.0005

Quantiles
        parameters     2.5%    25.0%    50.0%    75.0%   97.5%
  ────────────────  ───────  ───────  ───────  ───────  ──────
   coefficients[1]  -1.0991  -0.4265  -0.0199   0.3244  1.1093
   coefficients[2]  -1.1369  -0.1523   0.2854   0.7154  1.6488
   coefficients[3]  -1.1957  -0.6272  -0.3986  -0.1800  0.3587
   coefficients[4]  -0.3896  -0.0155   0.1663   0.3593  0.7818
   coefficients[5]  -1.6858  -0.6835  -0.2683   0.1378  1.1995
   coefficients[6]  -0.6865  -0.1672   0.0325   0.2214  0.7251
   coefficients[7]  -0.7644  -0.1976   0.0090   0.2835  0.8185
   coefficients[8]  -0.4980  -0.0194   0.1451   0.3428  0.7685
   coefficients[9]  -0.4643  -0.0294   0.1237   0.2807  0.7218
  coefficients[10]  -1.0898  -0.5091  -0.2846  -0.0413  0.5163
         intercept  -0.2240  -0.0671   0.0083   0.0746  0.2364
                σ₂   0.1043   0.1860   0.2525   0.3530  0.8490
devmotion commented 4 years ago

I'm pretty sure you can ignore the errors, it just tells you that two proposals were rejected since the gradient and some other values that I don't remember (maybe a more descriptive warning would be better? I saw https://github.com/TuringLang/AdvancedHMC.jl/issues/110, but IMO the variable names θ, r, ℓπ, ℓκ are a bit cryptic) were non-finite (e.g., NaN or Inf). I would assume that might be related to the truncated normal distribution in this example, but I guess @xukai92 knows a lot better what could possibly go wrong here.

xukai92 commented 4 years ago

They can be safely ignored. During the initial phase of the sampling, large step sizes were tried which might lead to numerical errors, which can be ignored as long as there is no consistent errors during sampling.

Pindar777 commented 4 years ago

Hi @devmotion,

I just tried to run the tutorial and it stops at line

μ, σ = rescale!(train; obsdim = 1) with ERROR: MethodError: no method matching /(::String, ::Int64)

rhe rescale! function cannot cope with strings. Hence I changed the code as follows

strExclude = :Model train = Matrix(select(trainset, Not([target, strExclude]))) test = Matrix(select(testset, Not([target, strExclude])))

The model then runs and prints the coefficients. However, for the tutorial it would be great to pass the names of the covariates to the coefficients. I'm pretty sure that is possible with turing.jl, but at the moment, I don't know how to do that. Could you please elaborate the code.

Thanks in advance

devmotion commented 4 years ago

I just tried to run the tutorial and it stops at line

I assume you forgot to run the line select!(data, Not(:Model)), which removes the Model column from the data set. The fix should not be needed (and the code ran fine when the tutorial was generated). Can you check if you forgot to ran this line before creating the training and test data sets?

However, for the tutorial it would be great to pass the names of the covariates to the coefficients.

Hmm I'm not sure if I understand you correctly. Would you like to name the coefficients in a way that relates them to the covariates in the model? Or would you mainly like to use the covariates in the trace plots instead of coefficients[1] etc.? Turing does not work with data of type DataFrame but only with arrays (and hopefully at some points whatever samples are generated by the distributions you specify). Therefore we convert the data frame to a matrix in the beginning. However, this conversion removes all headers and hence one has to relate the columns of the matrix and the samples of coefficients to the covariates manually. It is possible, e.g., to change the parameter names of the sampled chain by

newchain = let names=names(data, Not(target))
    set_names(chain, Dict("coefficients[$i]" => name for (i, name) in enumerate(names)))
end

with MCMCChains 3.0.* or

newchain = let names=names(data, Not(target))
    replacenames(chain, ("coefficients[$i]" => name for (i, name) in enumerate(names))...)
end

for MCMCChains 4.0.*.

Pindar777 commented 4 years ago

Hi @devmotion, thanks for your fast response. In fact, I somehow missed the select!-line. Now it runs as expected.

In terms of getting meaningful names to the coefficients I tried the proposed solution for MCMCChains 3.0* and get this error

julia> newchain = let names=names(data, Not(target)) set_names(chain, Dict("coefficients[$i]" => name for (i, name) in enumerate(names))) end ERROR: MethodError: no method matching names(::DataFrame, ::InvertedIndex{Symbol})

names(data, Not(target)) seems not to work. Then I tried this

julia> newchain = let names=names(select(data, Not(target))) set_names(chain, Dict("coefficients[$i]" => name for (i, name) in enumerate(names))) end ERROR: MethodError: no method matching Chains(::Array{Float64,3}, ::Array{Any,1}, ::Dict{Any,Any}; info=NamedTuple(), evidence=missing, sorted=true)

While names(select(data, Not(target))) produces a 10-element array it is not processed by the loop.

How to get it to work? Better install MCMChains 4.0.*?

Pindar777 commented 4 years ago

@devmotion Update: I found the solution

newchain = let names=String.(names(select(data, Not(target)))) set_names(chain, Dict("coefficients[$i]" => name for (i, name) in enumerate(names))) end

Cheers

devmotion commented 4 years ago

Great, that it works! I'm sorry, I didn't test the suggested renaming and just assumed that names(data, Not(target)) would work due to the DataFrames documentation.

I'll close this issue since it seems your issue is resolved.