StatisticalRethinkingJulia / StatisticalRethinking.jl

Julia package with selected functions in the R package `rethinking`. Used in the SR2... projects.
MIT License
385 stars 32 forks source link

Dependency issue with psisloo function #140

Closed Shmuma closed 2 years ago

Shmuma commented 2 years ago

Whey I try to use compare in :psis mode, I get

UndefVarError: psisloo not defined

Stacktrace:
 [1] compare(m::Vector{Matrix{Float64}}, ::Val{:psis}; mnames::Vector{String})
   @ StatisticalRethinking ~/.julia/packages/StatisticalRethinking/Uy2Qf/src/compare_models.jl:82
 [2] compare(m::Vector{Matrix{Float64}}, type::Symbol; mnames::Vector{String})
   @ StatisticalRethinking ~/.julia/packages/StatisticalRethinking/Uy2Qf/src/compare_models.jl:132
 [3] compare(m::Vector{Matrix{Float64}}, type::Symbol)
   @ StatisticalRethinking ~/.julia/packages/StatisticalRethinking/Uy2Qf/src/compare_models.jl:132
 [4] top-level scope
   @ In[92]:1
 [5] eval
   @ ./boot.jl:360 [inlined]
 [6] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1116

I found this function in ParetoSmoothedImportanceSampling.jl, but we should have it in dependecies list.

Shmuma commented 2 years ago

I remember you mentioned about ParetoSmooth migration. So, better fix will be to port compare to ParetoSmooth, right?

goedman commented 2 years ago

Hi Max ( @Shmuma ), also noted your 2 other issues on ParetoSmoothedImportanceSampling.jl.

All of this is changing with the recent release of a new package ParetoSmooth.jl. I haven't had time to update the :psis compare function in StatisticalRethinking.jl (which will look similar to waic, as in the book).

I did help on the loo_compare() in ParetoSmooth and there is an example of combining Turing.jl and ParetoSmooth.jl in ParetoSmooth's test suite (based on WaffleDivorce in fact). I did something similar for the Stan side (for now in the SR test suite).

The current state of affairs is:

  1. ParetoSmoothedImportanceSampling.jl is deprecated and in SR v3.1 replaced by StatsModelComparisons.jl.

  2. In v4 StatsModelComparisons.jl will be replaced by ParetoSmooth.jl (but this is not complete yet).

  3. With the exception of the parameter penalty terms, all info for the :psis version is easily accessible from ParetoSmooth.jl (see the examples in the test script mentioned above).

  4. Currently I am updating quadratic approximation stuff in SR, including the sample() function, as this was also updated in Turing. Most of this will move to StanQuap.jl but for Turing we do need the sample(::ModeResult) method. As soon as that is done I'll start working on the :psis part of compare.

Not ideal, but I can see the light at the end of the tunnel.

goedman commented 2 years ago

Hi Max (@Shmuma),

Oops, didn't see your 2nd comment. Yes, we should go the ParetoSmooth route and, as mentioned above, I already did some preliminary work. Any help would definitely be appreciated!

Also, on the other point I raise in my previous comment (item 4), what's your thinking about the sample(::ModeResult) function I noticed in your chapter 7 notebook, i.e. should sample() return a DataFrame or a Chain object (in Turing that would be an MCMCChains.Chain object, in StanQuap.jl sample(::QuapResult) would return a KeyedArray chain object)?

Or do you think a DataFrame suffices? Of course, using the Tables API the chains are easy to convert to a DataFrame.

Shmuma commented 2 years ago

Thanks for explaining!

Yes, I'll try to implement fixed version of compare(:psis) and make a pull request.

Also, on the other point I raise in my previous comment (item 4), what's your thinking about the sample(::ModeResult) function I noticed in your chapter 7 notebook, i.e. should sample() return a DataFrame or a Chain object (in Turing that would be an MCMCChains.Chain object, in StanQuap.jl sample(::QuapResult) would return a KeyedArray chain object)? Or do you think a DataFrame suffices? Of course, using the Tables API the chains are easy to convert to a DataFrame.

I'm not sure. Originally it was an ad-hoc solution to get samples from result of optimize, so, I make it to return DataFrame for simplicity. But to follow Turing's API, Chain might be better approach.

goedman commented 2 years ago

Great!

FYI: There is also an ongoing discussion here about cutting back the dependencies of ParetoSmooth.jl to reduce compile time. That has never been a goal of mine but just wanted to mention it in light of this discussion. At some point we might have to switch to a new package LeaveOneOut.jl or so.

In your sample()method, you do something like:

function sample(qr::QuapResult, count::Int)::DataFrame
    names = qr.params # coefnames(mode_result) in Turing
    means = values(qr.coef) # coef(mode_result) in Turing
    sigmas = Diagonal(qr.vcov) # stderror(mode_result) in Turing

    DataFrame([
        name => rand(Normal(μ, σ), count)
            for (name, μ, σ) ∈ zip(names, means, sigmas)
    ])
end

In creating the QuapResult object I use:

    ...
    distr = if size(vcov, 1) == 1
        Normal(coefvalues[1], √vcov[1]) # Normal expects std
    else
        MvNormal(coefvalues, vcov) # MvNormal expects variance matrix
    end
    ...

I wonder if this should also happen in sample()? But maybe this is taken care off in StatsBase.stderror?

Shmuma commented 2 years ago

Checked the latest version of the package and this function now working. Thanks a lot!

Now I can finish the chapter 7 notebook (which took me 1.5 months to complete), wheee!

goedman commented 2 years ago

Great. Once you're done I'll make a pass over the Pluto notebooks in StatisticalRethinkingTuring.jl to ensure they are up to date with your scripts (and include chapter 7).