Closed ParadaCarleton closed 3 years ago
Hahaha. Of course.
I tried to replicate the deprecation warning without success. I also could not find record of a deprecation in the Turing and AbstractMCMC repos. Would you be able to provide more info about the deprecation and your package versions please?
Here is the version I am using on Julia 1.6.1:
(test) pkg> st
Status `~/.julia/dev/ParetoSmooth/test/Project.toml`
[94b1ba4f] AxisKeys v0.1.18
[31c24e10] Distributions v0.25.11
[c7f686f2] MCMCChains v4.13.1
[df47a6cb] RData v0.8.3
[fce5fe82] Turing v0.16.6
[9a3f8284] Random
[10745b16] Statistics
[8dfed614] Test
I should be able to make the changes later this week.
Currently on J1.6.2 with 8 threads I get the following warning
message:
┌ Warning: Conversion of RData.RDummy{0xfe} to Julia is not implemented
└ @ RData ~/.julia/packages/RData/OT7M6/src/convert.jl:198
and once Turing kicks in many of below info
messages:
┌ Info: Found initial step size
└ ϵ = 0.4
┌ Info: Found initial step size
└ ϵ = 0.4
┌ Info: Found initial step size
└ ϵ = 0.2
┌ Info: Found initial step size
└ ϵ = 0.2
Sampling (4 threads) 100%|██████████████████████████████████████████████████████████| Time: 0:00:25
┌ Info: **Important Note:** The posterior log-likelihood must be computed with a `for` loop inside a
│ Turing model; broadcasting will result in all observations being treated as if they are a
└ single point.
Package versions (both ParetoSmooth and ParetoSmooth/test) appear similar to what Chris reported.
On Julie 1.7.0-beta3 and up there is an issue with AxisKeys.jl when printing out the PsisLoo object.
Turing.jl needs an update to AxisArrays.jl before MCMCChains.jl can be loaded I think. But in past Turing supported up to the current Julia release (1.6.2).
Not sure which version you were using Carlos.
Oh, weird, the deprecation warning isn't showing up for me either anymore. I assume I made some kind of mistake.
@itsdfish I took a look at the pareto_k
values, and they seem a little weird -- I don't think a Gaussian with a sample size of 100 should have tails that thick. Can you build a test to double-check the results?
@ParadaCarleton, sorry for the delay. I have been unusually busy. I should be able to get this this by Saturday at the latest, but I will aim for sooner.
@ParadaCarleton, I don't know how to test the kvalues without a ground truth. What you have here appears to achieve that goal.
I believe the problem with k-values computed from the Turing output is that you changed the dimensions for the MCMC samples. Turing produces a 3 dimensional array where the dimensions are [samples, parameters, chains] . I believe Stan.jl does the same, @goedman can you confirm? Is there a reason you made this change? If not, I will submit a pull request with changes
Yes, that is correct.
All currently released versions of CmdStan and StanJulia packages by default read samples in as an Array[draws, params, chains]. Back in 2010 that seemed logical given the structure of cmdstan's .csv files. It was also used by Mamba.Chains (taken as the starting point for the initial version of MCMCChains). And in simple examples you often work with a column of draws.
Recently I've added 2 output_formats (:namedtuple and :keyedarray) that return draws (and samples from generated_quantities such as log_lik) in the format [params/obs, draws/samples, chains].
In StatsModelComparisons.jl I mostly used the appended chains format [draws, params].
I tried to make sense of a suggestion/discussion in Turing but ended up not sure what came out of the discussion.
@goedman, thanks for your detailed reply. Considering that the most popular MCMC samplers return an array in the form [draw, parm, chain], it seems like we should keep that convention.
@ParadaCarleton, I don't know how to test the kvalues without a ground truth. What you have here appears to achieve that goal.
The easiest way to do that would be to build the same model with the same data in Julia and R. (I would suggest making the "data" equal to the quantiles of a standard normal distribution, for quantiles (1:32) ./ 33
.) We can then sample with Stan, save the results of loo
in Stan, and then compare these to the results in Julia to make sure they're close enough. If we draw enough samples, the estimates should be close.
I believe the problem with k-values computed from the Turing output is that you changed the dimensions for the MCMC samples. Turing produces a 3 dimensional array where the dimensions are [samples, parameters, chains]. I believe Stan.jl does the same, @goedman can you confirm? Is there a reason you made this change? If not, I will submit a pull request with changes
Yes! Thank you, I thought I had successfully modified the function to convert from [draw, param, chain]
to [param, draw, chain]
, but it looks like I made a mistake along the way. Since the tests passed, I thought this meant the results were good -- actually, this is kind of why I'd like a test comparing the Stan and Turing results -- the current tests passed since the error only affected the numerical results, not the shape of the array that got returned. It's not possible to compare the results to the true values, but I want to make sure that any differences between Stan and Turing are within the range you'd expect from sampling error.
Sorry. I am a bit confused about (1) the reason for converting [draw, param, chain] to [param, draw, chain] and (2) why your tests under "basic arrays" do not already test the accuracy of the loo results. All that is required is reverting back to [draw, param, chain] or properly reshaping the array extracted from the chain object. My concern is that this is a lot of work to create redundant tests that are more ambiguous: the mcmc samples may differ, but your tests seem to indicate that the loo functions are highly similar between R and Julia. So what we end up testing is whether the NUTS implementations are similar, but that already tested in AdvancedHMC.
I think it makes the most sense to order arrays as [draw, param, chain] because most packages work that way in Julia, as Rob noted. In either case, a simple warning might prevent a person from using an array of the wrong shape:
const ARRAY_DIMS_WARNING = "The supplied array of mcmc samples indicates you have more parameters than mcmc samples.
This is possible, but highly unusual. Please check that your array of mcmc samples has the following dimensions: [n_samples,n_parms,n_chains]."
n_posterior, n_parms, n_chains = size(samples)
if n_parms > n_posterior
@info ARRAY_DIMS_WARNING
end
Sorry. I am a bit confused about (1) the reason for converting [draw, param, chain] to [param, draw, chain] and (2) why your tests under "basic arrays" do not already test the accuracy of the loo results. All that is required is reverting back to [draw, param, chain] or properly reshaping the array extracted from the chain object. My concern is that this is a lot of work to create redundant tests that are more ambiguous: the mcmc samples may differ, but your tests seem to indicate that the loo functions are highly similar between R and Julia. So what we end up testing is whether the NUTS implementations are similar, but that already tested in AdvancedHMC.
I think it makes the most sense to order arrays as [draw, param, chain] because most packages work that way in Julia, as Rob noted. In either case, a simple warning might prevent a person from using an array of the wrong shape:
const ARRAY_DIMS_WARNING = "The supplied array of mcmc samples indicates you have more parameters than mcmc samples. This is possible, but highly unusual. Please check that your array of mcmc samples has the following dimensions: [n_samples,n_parms,n_chains]." n_posterior, n_parms, n_chains = size(samples) if n_parms > n_posterior @info ARRAY_DIMS_WARNING end
Yep, the reason why I flipped up was an accident -- I mixed up the [draw, param, chain]
argument with the [data_point, draw, chain]
argument required by the psis_loo
function.
The reason is I want to test that Julia/R results are similar is to make sure that the functions are extracting the log-likelihood from Turing objects correctly, not that the Stan/Turing results behave similarly. If the log-likelihood is being extracted correctly, then any differences between the R and Julia implementations should be less than the expected sampling error.
I like that warning, it looks good!
@ParadaCarleton, thanks for clarifying. I will submit a PR with a fix for the dimensions, a warning for the dimensions, and an additional test that shows the Turing method produces the same pointwise log likelihoods as the method that accepts a user-defined function. I can look into the test that you describe as time permits. Unfortunately, my schedule is limited over the next week or so. In the meantime, the fixes should give the correct result with a fair degree of certainty.
@itsdfish I do agree that most packages seem to use [draws, params, chains]. The reason I started to experiment is that separation between draws and chains, i.e. to append all chains one has to resort to something like:
ndraws, nparams, nchains = size(ma);
rma = reshape(permutedims(ma, [1, 3, 2]), ndraws*nchains, nparams);
mean(rma, dims=1) |> display
or overload MCMCChains' chainscat()
.
Edit: There is also the ArViz discussion (which is quite interesting and might use or include AlgebraOfGraphics). ArViz uses in some cases [chains, draws, params]. That does make sense I think.
@goedman, that is cumbersome indeed. I wonder whether there are other operations that introduce tradeoffs between the approaches. Knowing the layout of the tradeoff space might help guide a decision.
Good point @itsdfish! Usually it takes me re-doing a substantial part of StatisticalRethinking for the trade-off landscape to emerge.
On my list of options for StanJulia and StatisticalRethinking v4 is also a KeyedArray chain structure using [draws, chains, params] (which works maybe the best of the 3 KeyedArray options). I prefer that over the basic Tables.jl based chains object output_format option in StanSample.jl. Not completely fair of course as Tables.jl is a package developer interface while AxisKeys.jl is more user focused.
Anyway, these are typical results I'm getting for Stan and Turing:
Stan:
[ Info: Some Pareto k values are slightly high (>0.5); some pointwise estimates may be slow to converge or have high variance.
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -63.05 │ 6.46 │ -1.26 │ 0.13 │
│ naive_est │ -59.25 │ 4.83 │ -1.18 │ 0.10 │
│ overfit │ 3.81 │ 1.89 │ 0.08 │ 0.04 │
└───────────┴────────┴──────────┴───────┴─────────┘
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -69.72 │ 4.98 │ -1.39 │ 0.10 │
│ naive_est │ -66.70 │ 4.17 │ -1.33 │ 0.08 │
│ overfit │ 3.02 │ 0.93 │ 0.06 │ 0.02 │
└───────────┴────────┴──────────┴───────┴─────────┘
[ Info: Some Pareto k values are slightly high (>0.5); some pointwise estimates may be slow to converge or have high variance.
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -63.74 │ 6.44 │ -1.27 │ 0.13 │
│ naive_est │ -59.05 │ 4.67 │ -1.18 │ 0.09 │
│ overfit │ 4.70 │ 1.91 │ 0.09 │ 0.04 │
└───────────┴────────┴──────────┴───────┴─────────┘
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5.1s │ 0.00 │ 0.00 │ 0.67 │
│ m5.3s │ -0.69 │ 0.33 │ 0.33 │
│ m5.2s │ -6.67 │ 4.66 │ 0.00 │
└───────┴────────┴───────┴────────┘
Turing:
┌ Warning: Some Pareto k values are very high (>0.7), indicating that PSIS has failed to approximate the true distribution.
└ @ ParetoSmooth ~/.julia/dev/ParetoSmooth/src/LooStructs.jl:96
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -63.00 │ 6.55 │ -1.26 │ 0.13 │
│ naive_est │ -59.22 │ 4.89 │ -1.18 │ 0.10 │
│ overfit │ 3.78 │ 1.93 │ 0.08 │ 0.04 │
└───────────┴────────┴──────────┴───────┴─────────┘
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -69.68 │ 4.96 │ -1.39 │ 0.10 │
│ naive_est │ -66.70 │ 4.16 │ -1.33 │ 0.08 │
│ overfit │ 2.98 │ 0.92 │ 0.06 │ 0.02 │
└───────────┴────────┴──────────┴───────┴─────────┘
[ Info: Some Pareto k values are slightly high (>0.5); some pointwise estimates may be slow to converge or have high variance.
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -63.69 │ 6.44 │ -1.27 │ 0.13 │
│ naive_est │ -59.03 │ 4.68 │ -1.18 │ 0.09 │
│ overfit │ 4.66 │ 1.90 │ 0.09 │ 0.04 │
└───────────┴────────┴──────────┴───────┴─────────┘
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5_1t │ 0.00 │ 0.00 │ 0.67 │
│ m5_3t │ -0.69 │ 0.42 │ 0.33 │
│ m5_2t │ -6.68 │ 4.74 │ 0.00 │
└───────┴────────┴───────┴────────┘
With SR/ulam():
PSIS SE dPSIS dSE pPSIS weight
m5.1u 126.0 12.83 0.0 NA 3.7 0.67 m5.3u 127.4 12.75 1.4 0.75 4.7 0.33 m5.2u 139.5 9.95 13.6 9.33 3.0 0.00
SR includes the -2 factor.
@goedman, thanks for sharing those results. Are those based on estimates of the same data? If so, this means that testing loo from independent chains would be imprecise. The SE of the difference is
julia> sqrt(4.96^2 + 4.98^2)
7.02865563248051
in the best case above.
@itsdfish, I would be more concerned if the LooCompare results ended up very different.
There are several Pareto k info/warning messages. In Statistical Rethinking the suggestion is made to replace the Normal distribution for divorce_rate
by a Student-t distribution (with thicker tails). This reduces the relative influence of outliers such as Idaho and Maine on the predictions.
Furthermore, as I mentioned once before, I am a bit concerned why Turing inferences appear to vary more than Stan inferences. I should really perform several more simulations on this topic.
Above results tell me that if prediction accuracy is my primary concern, it is expected that the divorce_rate ~ median_age_at_marriage
model has better out-of-sample performance than the other 2 models (m5.2 is divorce_rate ~ marriage_rate
and m5.3 is divorce_rate ~ median_age_at_marriage + marriage_rate
).
I'm sure @ParadaCarleton can explain this way better than I can.
@goedman These look great to me! Can I see the Pareto k values? My guess is that Stan's Pareto k values are just barely under 0.7 for these outliers, while Turing's are a little bit above. As long as the maximum difference doesn't exceed 0.1 or so I wouldn't be too worried.
Turing:
┌ Warning: Some Pareto k values are very high (>0.7), indicating that PSIS has failed to approximate the true distribution.
└ @ ParetoSmooth ~/.julia/dev/ParetoSmooth/src/LooStructs.jl:96
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -63.00 │ 6.55 │ -1.26 │ 0.13 │
│ naive_est │ -59.22 │ 4.89 │ -1.18 │ 0.10 │
│ overfit │ 3.78 │ 1.93 │ 0.08 │ 0.04 │
└───────────┴────────┴──────────┴───────┴─────────┘
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -69.68 │ 4.96 │ -1.39 │ 0.10 │
│ naive_est │ -66.70 │ 4.16 │ -1.33 │ 0.08 │
│ overfit │ 2.98 │ 0.92 │ 0.06 │ 0.02 │
└───────────┴────────┴──────────┴───────┴─────────┘
[ Info: Some Pareto k values are slightly high (>0.5); some pointwise estimates may be slow to converge or have high variance.
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -63.69 │ 6.44 │ -1.27 │ 0.13 │
│ naive_est │ -59.03 │ 4.68 │ -1.18 │ 0.09 │
│ overfit │ 4.66 │ 1.90 │ 0.09 │ 0.04 │
└───────────┴────────┴──────────┴───────┴─────────┘
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5_1t │ 0.00 │ 0.00 │ 0.67 │
│ m5_3t │ -0.69 │ 0.42 │ 0.33 │
│ m5_2t │ -6.68 │ 4.74 │ 0.00 │
└───────┴────────┴───────┴────────┘
Stan:
┌ Warning: Some Pareto k values are very high (>0.7), indicating that PSIS has failed to approximate the true distribution.
└ @ ParetoSmooth ~/.julia/dev/ParetoSmooth/src/LooStructs.jl:96
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -62.96 │ 6.46 │ -1.26 │ 0.13 │
│ naive_est │ -59.23 │ 4.86 │ -1.18 │ 0.10 │
│ overfit │ 3.73 │ 1.85 │ 0.07 │ 0.04 │
└───────────┴────────┴──────────┴───────┴─────────┘
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -69.73 │ 4.97 │ -1.39 │ 0.10 │
│ naive_est │ -66.71 │ 4.14 │ -1.33 │ 0.08 │
│ overfit │ 3.02 │ 0.96 │ 0.06 │ 0.02 │
└───────────┴────────┴──────────┴───────┴─────────┘
[ Info: Some Pareto k values are slightly high (>0.5); some pointwise estimates may be slow to converge or have high variance.
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 50 data points.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ loo_est │ -63.90 │ 6.45 │ -1.28 │ 0.13 │
│ naive_est │ -59.07 │ 4.62 │ -1.18 │ 0.09 │
│ overfit │ 4.83 │ 1.98 │ 0.10 │ 0.04 │
└───────────┴────────┴──────────┴───────┴─────────┘
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5.1s │ 0.00 │ 0.00 │ 0.72 │
│ m5.3s │ -0.94 │ 0.36 │ 0.28 │
│ m5.2s │ -6.77 │ 4.65 │ 0.00 │
└───────┴────────┴───────┴────────┘
Also did some more testing with Turing (not seeded). Particles summaries of chains do look ok and similar to Stan:
Turing runs:
(a = -0.00155 ± 0.1, σ = 0.821 ± 0.083, bA = -0.567 ± 0.11)
(a = 0.00156 ± 0.11, bM = 0.346 ± 0.13, σ = 0.952 ± 0.098)
(a = -0.00086 ± 0.097, bM = -0.0633 ± 0.16, σ = 0.826 ± 0.086, bA = -0.607 ± 0.16)
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5_1t │ 0.00 │ 0.00 │ 0.69 │
│ m5_3t │ -0.82 │ 0.39 │ 0.30 │
│ m5_2t │ -6.75 │ 4.67 │ 0.00 │
└───────┴────────┴───────┴────────┘
(a = 0.00164 ± 0.1, σ = 0.823 ± 0.084, bA = -0.566 ± 0.12)
(a = -0.000387 ± 0.11, bM = 0.349 ± 0.13, σ = 0.95 ± 0.099)
(a = -0.0015 ± 0.1, bM = -0.0596 ± 0.16, σ = 0.829 ± 0.088, bA = -0.607 ± 0.16)
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5_1t │ 0.00 │ 0.00 │ 0.68 │
│ m5_3t │ -0.75 │ 0.35 │ 0.32 │
│ m5_2t │ -6.43 │ 4.73 │ 0.00 │
└───────┴────────┴───────┴────────┘
(a = 0.000454 ± 0.1, σ = 0.822 ± 0.083, bA = -0.568 ± 0.11)
(a = -0.00252 ± 0.11, bM = 0.347 ± 0.13, σ = 0.949 ± 0.097)
(a = 0.000968 ± 0.1, bM = -0.0591 ± 0.16, σ = 0.829 ± 0.088, bA = -0.606 ± 0.16)
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5_1t │ 0.00 │ 0.00 │ 0.72 │
│ m5_3t │ -0.96 │ 0.37 │ 0.28 │
│ m5_2t │ -6.88 │ 4.58 │ 0.00 │
└───────┴────────┴───────┴────────┘
(a = 0.000295 ± 0.098, σ = 0.826 ± 0.087, bA = -0.568 ± 0.12)
(a = 0.00071 ± 0.11, bM = 0.346 ± 0.13, σ = 0.947 ± 0.094)
(a = 0.00109 ± 0.099, bM = -0.0605 ± 0.15, σ = 0.828 ± 0.087, bA = -0.606 ± 0.16)
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5_1t │ 0.00 │ 0.00 │ 0.70 │
│ m5_3t │ -0.85 │ 0.36 │ 0.30 │
│ m5_2t │ -6.62 │ 4.52 │ 0.00 │
└───────┴────────┴───────┴────────┘
Stan runs:
(a = -0.000515 ± 0.1, bA = -0.563 ± 0.11, sigma = 0.824 ± 0.084)
(a = -0.00297 ± 0.11, bM = 0.35 ± 0.13, sigma = 0.945 ± 0.097)
(a = 0.00047 ± 0.1, bA = -0.61 ± 0.16, bM = -0.0637 ± 0.16, sigma = 0.828 ± 0.089)
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5.1s │ 0.00 │ 0.00 │ 0.75 │
│ m5.3s │ -1.09 │ 0.39 │ 0.25 │
│ m5.2s │ -6.71 │ 4.55 │ 0.00 │
└───────┴────────┴───────┴────────┘
(a = 0.00189 ± 0.1, bA = -0.566 ± 0.11, sigma = 0.825 ± 0.086)
(a = -0.0014 ± 0.11, bM = 0.349 ± 0.13, sigma = 0.952 ± 0.1)
(a = 0.000481 ± 0.1, bA = -0.607 ± 0.16, bM = -0.0604 ± 0.16, sigma = 0.825 ± 0.086)
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5.1s │ 0.00 │ 0.00 │ 0.68 │
│ m5.3s │ -0.75 │ 0.38 │ 0.32 │
│ m5.2s │ -6.66 │ 4.63 │ 0.00 │
└───────┴────────┴───────┴────────┘
(a = 0.000583 ± 0.098, bA = -0.564 ± 0.12, sigma = 0.824 ± 0.087)
(a = -0.000339 ± 0.11, bM = 0.349 ± 0.13, sigma = 0.943 ± 0.095)
(a = 0.000904 ± 0.1, bA = -0.608 ± 0.16, bM = -0.0607 ± 0.16, sigma = 0.828 ± 0.087)
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5.1s │ 0.00 │ 0.00 │ 0.73 │
│ m5.3s │ -1.02 │ 0.37 │ 0.26 │
│ m5.2s │ -6.73 │ 4.58 │ 0.00 │
└───────┴────────┴───────┴────────┘
(a = 0.00394 ± 0.099, bA = -0.566 ± 0.11, sigma = 0.821 ± 0.085)
(a = 0.000164 ± 0.11, bM = 0.346 ± 0.13, sigma = 0.949 ± 0.1)
(a = 7.57e-5 ± 0.1, bA = -0.606 ± 0.16, bM = -0.0608 ± 0.16, sigma = 0.826 ± 0.088)
┌───────┬────────┬───────┬────────┐
│ │ d_PSIS │ d_SE │ weight │
├───────┼────────┼───────┼────────┤
│ m5.1s │ 0.00 │ 0.00 │ 0.70 │
│ m5.3s │ -0.83 │ 0.41 │ 0.30 │
│ m5.2s │ -6.62 │ 4.73 │ 0.00 │
└───────┴────────┴───────┴────────┘
@itsdfish Sorry, didn't respond to your earlier question. Yes, all simulations use the same dataset.
@goedman, given that you have already compared Turing and Stan, I was wondering if you would be willing to add those additional tests?
Hi Chris ( @itsdfish @ParadaCarleton ) You mean adding StanSample to the test environment and running the exact same steps, as shown above?
Hi Chris (@itsdfish)
Not sure why you asked the question, but below is what I used.
Unfortunately this uses upcoming releases of both StatisticalRethinking and StanSample (both v4), which are substantial breaking releases. It is based on a KeyedArray chains format [draws, chains, params] and dropping the kwarg output_format in read_samples(model; output_format=...)
in favor of read_samples(model, [:keyedarray, :dataframe, namedtuple, ...]; ...)
. As :keyedarray is the default, read_samples(model)
will mostly suffice.
The changes in code are limited, but in examples and tests very substantial. As I'm traveling the next 3 weeks this will probably take most of this month for StanJulai and Sep for StatisticalRethinkingJulia. I'll keep the v4 branches on Github "reasonably" up to date.
using StanSample, ParetoSmooth, NamedTupleTools
using StatisticalRethinking
df = CSV.read(sr_datadir("WaffleDivorce.csv"), DataFrame);
scale!(df, [:Marriage, :MedianAgeMarriage, :Divorce])
data = (N=size(df, 1), D=df.Divorce_s, A=df.MedianAgeMarriage_s,
M=df.Marriage_s)
stan5_1 = "
data {
int < lower = 1 > N; // Sample size
vector[N] D; // Outcome
vector[N] A; // Predictor
}
parameters {
real a; // Intercept
real bA; // Slope (regression coefficients)
real < lower = 0 > sigma; // Error SD
}
transformed parameters {
vector[N] mu; // mu is a vector
for (i in 1:N)
mu[i] = a + bA * A[i];
}
model {
a ~ normal(0, 0.2); //Priors
bA ~ normal(0, 0.5);
sigma ~ exponential(1);
D ~ normal(mu , sigma); // Likelihood
}
generated quantities {
vector[N] log_lik;
for (i in 1:N)
log_lik[i] = normal_lpdf(D[i] | mu[i], sigma);
}
";
stan5_2 = "
data {
int N;
vector[N] D;
vector[N] M;
}
parameters {
real a;
real bM;
real<lower=0> sigma;
}
transformed parameters {
vector[N] mu;
for (i in 1:N)
mu[i]= a + bM * M[i];
}
model {
a ~ normal( 0 , 0.2 );
bM ~ normal( 0 , 0.5 );
sigma ~ exponential( 1 );
D ~ normal( mu , sigma );
}
generated quantities {
vector[N] log_lik;
for (i in 1:N)
log_lik[i] = normal_lpdf(D[i] | mu[i], sigma);
}
";
stan5_3 = "
data {
int N;
vector[N] D;
vector[N] M;
vector[N] A;
}
parameters {
real a;
real bA;
real bM;
real<lower=0> sigma;
}
transformed parameters {
vector[N] mu;
for (i in 1:N)
mu[i] = a + bA * A[i] + bM * M[i];
}
model {
a ~ normal( 0 , 0.2 );
bA ~ normal( 0 , 0.5 );
bM ~ normal( 0 , 0.5 );
sigma ~ exponential( 1 );
D ~ normal( mu , sigma );
}
generated quantities{
vector[N] log_lik;
for (i in 1:N)
log_lik[i] = normal_lpdf(D[i] | mu[i], sigma);
}
";
m5_1s = SampleModel("m5.1s", stan5_1)
rc5_1s = stan_sample(m5_1s; data)
m5_2s = SampleModel("m5.2s", stan5_2)
rc5_2s = stan_sample(m5_2s; data)
m5_3s = SampleModel("m5.3s", stan5_3)
rc5_3s = stan_sample(m5_3s; data)
if success(rc5_1s) && success(rc5_2s) && success(rc5_3s)
nt5_1s = read_samples(m5_1s, :particles)
NamedTupleTools.select(nt5_1s, (:a, :bA, :sigma)) |> display
nt5_2s = read_samples(m5_2s, :particles)
NamedTupleTools.select(nt5_2s, (:a, :bM, :sigma)) |> display
nt5_3s = read_samples(m5_3s, :particles)
NamedTupleTools.select(nt5_3s, (:a, :bA, :bM, :sigma)) |> display
models = [m5_1s, m5_2s, m5_3s]
loglikelihood_name = :log_lik
loo_comparison = loo_compare(models)
println()
for (i, psis) in enumerate(loo_comparison.psis)
psis |> display
pk_plot(psis.pointwise(:pareto_k))
savefig(joinpath(@__DIR__, "m5.$(i)s.png"))
end
println()
loo_comparison |> display
end
#=
With SR/ulam():
PSIS SE dPSIS dSE pPSIS weight
m5.1u 126.0 12.83 0.0 NA 3.7 0.67 m5.3u 127.4 12.75 1.4 0.75 4.7 0.33 m5.2u 139.5 9.95 13.6 9.33 3.0 0.00
=#
Statistical Rethinking uses the factor -2.
@goedman, sorry for the radio silence. I have been unusually busy with family and work. ParadaCarleton was interested in comparing the various Loo values between Turing and Stan and making a test. I noticed that the comparisons you were performing could form the basis for the requested tests. Thanks for sharing. I will work this into some tests as soon as I can.
@itsdfish Do you also think you could add a test for result accuracy by using the example log-likelihood array I've included in the RData file? (Or, alternatively, by building a log-likelihood array some other way.) The
Chains()
constructor should let you build aChains()
object. While the results won't be exactly the same, verifying that the Pareto_k values aren't off by more than 0.1 and that the difference in the ELPD estimates is less than 4 times the MCSE would be good.