loo method for MCMCChains

itsdfish commented 3 years ago

@ParadaCarleton and @godeman,

Here is a method for computing loo from an MCMCChain object. I'm not sure if that is what you have in mind.

Also, I'm not sure how to fix the test error with documenter. I tried creating a new CI.yml file without documenter, but that did not resolve the problem.

ParadaCarleton commented 3 years ago

@itsdfish This looks great, but it looks like the tests are erroring (outside of the functions themselves -- looks like the problem is with the tests themselves).

I believe Turing already stores log-likelihood values, so they can be provided as an array rather than just as a function. This should let you avoid computing them twice. This information should be contained inside the VarInfo object. You can ask Tor Fjelde or David Widmann on the Slack if you need more help; Seth Axen might also know since he got PSIS to work in ArviZ.jl.

itsdfish commented 3 years ago

Thanks @ParadaCarleton .

I added a method that computes loo from an Chain object and Turing model object. I kept the other methods so that loo could be computed from Chain object based on non-Turing model. Note that this requires adding Turing as a dependency.

Regarding the tests, they pass for me locally. It turns out that the remote tests were failing because of an http request error . I did not realize that there were two CI files. Typically, the CI file is located in .github/workflows. I deleted the one in the parent folder and kept the one in .github/workflows.

itsdfish commented 3 years ago

In the last commit, I resolved the conflicts in project.toml and qualified MCMCDiagnosticTools.FFTESSMethod() because it was throwing an error.

ParadaCarleton commented 3 years ago

Thanks @ParadaCarleton .

I added a method that computes loo from an Chain object and Turing model object. I kept the other methods so that loo could be computed from Chain object based on non-Turing model. Note that this requires adding Turing as a dependency.

Regarding the tests, they pass for me locally. It turns out that the remote tests were failing because of an http request error . I did not realize that there were two CI files. Typically, the CI file is located in .github/workflows. I deleted the one in the parent folder and kept the one in .github/workflows.

This is great, thanks! Could you make Turing an optional dependency by using Requires.jl, so people don't have to install it if they're using some other sampling package?

itsdfish commented 3 years ago

No problem. Can do. I should have that updated by tomorrow if not sooner.

goedman commented 3 years ago

Looking over Chris' code, in the signature of pointwise_loglikes, model is untyped:

function pointwise_loglikes(chain::Chains, model)

which would work if Turing is not in the list of dependencies. Is that why you did it this way Chris?

Should that be mentioned in in the doc string?

itsdfish commented 3 years ago

@ParadaCarleton, I added Requires. My understanding is that it functions more like conditional loading rather than conditional installation. Turing will be downloaded and installed, but the methods for Turing will only be exported if Turing is loaded in the Main module. This should make ParetoSmooth load faster when it is not used with Turing.

itsdfish commented 3 years ago

@goedman, I typically only annotate the argument types in order to disambiguate methods. I have you covered: in the [docs], I mention that model is a Turing model.

Thanks for looking over my pull request

itsdfish commented 3 years ago

@ParadaCarleton, thanks for the feedback. I will address your comments as time permits.

@goedman, is it true that Turing should be added in "extras" in project.toml if needed for tests?

goedman commented 3 years ago

Yes, but in the setup of ParetoSmooth probably in the [deps] section in the Project.toml in the test directory.

I need to study how runtests.jl does this (check for a Project.toml in the test directory and fall back to the [extras] section if its not there?). If that is a general mechanism, you could use that in many other contexts, say switch to a Turing environment or to a Stan environment.

Edit: I wonder if the [extras] and [targets] sections are still needed in this setup.

itsdfish commented 3 years ago

@goedman , I am also a bit confused. I always added test dependencies in the main project.toml, but I wouldn't be surprised if I was doing it incorrectly (or at least creating an unneeded dependency). Please let me know if you find an answer or an example in a registered package.

goedman commented 3 years ago

Just re-read Pkg docs and indeed extras and targets were J1.0 and J1.1. Since J1.2 test requirements can simply be added (even in the REPL) to Project.toml in the test directory.

Edit: Earlier on I was as surprised as you are and I asked Carlos about this setup and he said its just how PkgTemplate structures new projects. Wasn’t there a Neil Young song “Rust never sleeps”?

itsdfish commented 3 years ago

Thanks for the info @goedman. I will add the test dependencies to test/project.toml

I also had a misunderstanding about psis_loo. If I understand correctly, in the output below, total_score is the value used in model comparison. Is that correct?

julia> loo1.estimates
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   criterion ∈ 3-element Vector{Symbol}
→   estimate ∈ 2-element Vector{Symbol}
And data, 3×2 Matrix{Float64}:
                    (:Estimate)  (:SE)
  (:total_score)  -144.644        9.22217
  (:overfit)        70.9333       7.1834
  (:avg_score)      -2.89288      0.184443

goedman commented 3 years ago

Yes, that’s the one I used to call ‘loo’ and it should be close to WAIC (but a better estimate to compare models).

itsdfish commented 3 years ago

@ParadaCarleton, thanks again for your feedback. I learned a few new things.

I made some comments above. For example, I could not get Tullio to work for some reason. But I think I was able to address the other problems.

ParadaCarleton commented 3 years ago

Thanks for the info @goedman. I will add the test dependencies to test/project.toml

I also had a misunderstanding about psis_loo. If I understand correctly, in the output below, total_score is the value used in model comparison. Is that correct?
julia> loo1.estimates
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   criterion ∈ 3-element Vector{Symbol}
→   estimate ∈ 2-element Vector{Symbol}
And data, 3×2 Matrix{Float64}:
                    (:Estimate)  (:SE)
  (:total_score)  -144.644        9.22217
  (:overfit)        70.9333       7.1834
  (:avg_score)      -2.89288      0.184443

Yes, :total_score compares the amount of evidence for two models. :avg_score is something I introduced (I haven't seen it in earlier implementations of PSIS-LOO). It's an estimate of the expected log probability density/cross-entropy, and it's equal to :total_score divided by the sample size. This number compares the relative quality of different models. To use an analogy to more familiar quantities, :total_score behaves like a p-value while :avg_score acts more like R^2.

ParadaCarleton commented 3 years ago

@itsdfish A list of things I'd need before I can merge the pull request:

psis, psis_loo, and loo methods that accept MCMCChains as input.
Code follows the Blue style guide. At the moment the only significant problem is docstrings -- just using the templates provided should work.
Another problem with the functions as currently written is that functions are not the first argument, which makes them incompatible with do blocks.
Arrays should be replaced with AbstractArrays, since we don't know if users will be using Arrays or something else.

itsdfish commented 3 years ago

@ParadaCarleton, I have 2-4 complete. I will finish 1 tomorrow and resubmit.

itsdfish commented 3 years ago

@ParadaCarleton, I believe the latest commit satisfies the requested changes.

Note that no changes were required for loo. It works without modification because it passes the arguments to psis_loo where the correct method is selected.

I also cleaned up and organized the tests. Currently, I check that pointwise log likelihood output has the correct dimensions and that it sums to the sum of the log likelihoods. I only test that the new methods for psis and psis_loo return the correct type. I do not validate the accuracy of the output because I do not have a ground truth. However, your tests cover the accuracy of the output. So if the output of psis and psis_loo are correct, and the pointwise log likelihoods are correct, then testing the accuracy of the output is not necessary.

One last thing I want to confirm. Based on the documentation and the tests, it looks like the negative of the pointwise log likelihoods are passed to psis but not psis_loo. I want to confirm; otherwise, the ordering of model comparisons might be reversed.

ParadaCarleton commented 3 years ago

This looks great, thanks so much for all your hard work!

itsdfish commented 3 years ago

No problem! Thank you for your helpful feedback and patience. I am glad to have the opportunity to contribute to a very useful project.

By the way, I recommend posting an announcement on Discourse when you think the package is ready. I know there has been a lot interest in having this capability in Julia.

ParadaCarleton commented 3 years ago

@itsdfish sounds good; I think the only things missing before I can announce this are support for Stan.jl and a comparison function.

goedman commented 3 years ago

You mean an example or would you consider adding Stan to the test environment?

goedman commented 3 years ago

The compare() method I wrote for StatisticalRethinking produces something like this:

julia> df_psis = compare([m5_1s, m5_2s, m5_3s], :psis)
3×8 DataFrame
 Row │ models  PSIS     lppd     SE       dPSIS    dSE      pPSIS    weight  
     │ String  Float64  Float64  Float64  Float64  Float64  Float64  Float64 
─────┼───────────────────────────────────────────────────────────────────────
   1 │ m5.1s     125.4   118.52    12.39      0.0     0.0      3.67     0.66
   2 │ m5.3s     126.7   118.09    12.44      1.3     0.7      4.62     0.34
   3 │ m5.2s     138.8   133.37     9.68     13.4     8.86     2.92     0.0

My first question: Is this, of course based on the ParetoSmooth's loo() method, what you are looking for?

I used Dataframes and would need to convert to AxisKeys.jl format. A good exercise for me as I only recently looked into AxisKeys.jl. And obtain the data from the Psisloo object.

You also mention support for Stan.jl. We could indeed add StanSample.jl to the test Project.toml and install Stan's cmdstan in the CI script. Technically simple to do and robust.

The above 3 models (m5.1s, m5.3s and m5.2s) investigate the association of the median age at marriage (A), the marriage rate (M) with the divorce rate (D) in the southern US states. Here:

m5.1s: D ~ A m5.3s: D ~ A + M m5.2s: D ~ M

E.g., for m5.3s the StanLanguage programs is:

stan5_3 = "
data {
  int N;
  vector[N] D;
  vector[N] M;
  vector[N] A;
}
parameters {
  real a;
  real bA;
  real bM;
  real<lower=0> sigma;
}
transformed parameters {
    vector[N] mu;
    mu = a + + bA * A + bM * M;
}
model {
  a ~ normal( 0 , 0.2 );
  bA ~ normal( 0 , 0.5 );
  bM ~ normal( 0 , 0.5 );
  sigma ~ exponential( 1 );
  D ~ normal( mu , sigma );
}
generated quantities{
    vector[N] log_lik;
    for (i in 1:N)
        log_lik[i] = normal_lpdf(D[i] | mu[i], sigma);
}
";

and I extract the log_lik matrix as before, e.g. for above Stan Language model (stan5_3):

# Either add WaffleDivorce.csv data to the test directory or simulate the association
df = CSV.read(sr_datadir("WaffleDivorce.csv"), DataFrame);
# Replace scale!() with zscore() or something.
scale!(df, [:Marriage, :MedianAgeMarriage, :Divorce])                                   
data = (N=size(df, 1), D=df.Divorce_s, A=df.MedianAgeMarriage_s, M=df.Marriage_s)

m5_3s = SampleModel("m5.3s", stan5_3)
rc5_3s = stan_sample(m5_3s; data)

if success(rc5_3s)
    st5_3s = read_samples(m5_3s; output_format=:table);
    log_lik = matrix(st5_3s, "log_lik")
    ll = reshape(Matrix(log_lik'), data.N, m5_3s.method.num_samples, m5_3s.n_chains[1]);
    m5_3s_loo = ParetoSmooth.loo(ll)
end

My 2nd question is how strongly you would like to demonstrate Stan.jl versus to use the equivalent Turing models and use Chris' additions and possible simulate the association. We can still, on a high level, document how to use this package with Stan.

TuringLang / ParetoSmooth.jl

loo method for MCMCChains #12