TuringLang / ParetoSmooth.jl

An implementation of PSIS algorithms in Julia.
http://turinglang.org/ParetoSmooth.jl/
MIT License
19 stars 12 forks source link

Lazy-compute-ESS #52

Closed ParadaCarleton closed 2 years ago

ParadaCarleton commented 2 years ago

@sethaxen This PR adds a keyword arg for ESS calculations. If set to false, psis replaces the ESS values with a singleton matrix containing one NaN. It should avoid duplicate ESS computations and it makes the code a bit shorter; do you think it's good?

codecov[bot] commented 2 years ago

Codecov Report

Merging #52 (58ef87f) into main (8650285) will decrease coverage by 0.27%. The diff coverage is 82.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
- Coverage   82.40%   82.13%   -0.28%     
==========================================
  Files          11       11              
  Lines         341      347       +6     
==========================================
+ Hits          281      285       +4     
- Misses         60       62       +2     
Impacted Files Coverage Δ
src/ModelComparison.jl 86.79% <ø> (ø)
src/TuringHelpers.jl 100.00% <ø> (ø)
src/ImportanceSampling.jl 77.88% <80.00%> (-0.47%) :arrow_down:
src/ESS.jl 80.00% <100.00%> (ø)
src/GPD.jl 96.00% <100.00%> (ø)
src/LeaveOneOut.jl 84.48% <100.00%> (-0.27%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8650285...58ef87f. Read the comment docs.

sethaxen commented 2 years ago

What I don't like about this approach is that it's still allocating a vector that the user doesn't necessarily want. I still prefer a lazy computation of the vector only if the user requests it, which can happen one of two ways:

  1. The approach in #51, using getproperty
  2. OR overload psis_ess(::Psis) and psis_sup_ess(::Psis) and export psis_ess and psis_sup_ess

I chose (1) earlier so that the same API is supported, but I slightly prefer something like (2).

ParadaCarleton commented 2 years ago

What I don't like about this approach is that it's still allocating a vector that the user doesn't necessarily want. I still prefer a lazy computation of the vector only if the user requests it, which can happen one of two ways:

  1. The approach in Avoid computing ess and sup_ess in psis #51, using getproperty
  2. OR overload psis_ess(::Psis) and psis_sup_ess(::Psis) and export psis_ess and psis_sup_ess

I chose (1) earlier so that the same API is supported, but I slightly prefer something like (2).

Do you think the extra allocation is a meaningful problem? When ess=false is passed the function allocates a singleton, so it takes ~no time or memory.

That being said, I just realized that memoizing psis_ess should accomplish the same goal, so I suppose I can do that instead if you want.

sethaxen commented 2 years ago

Do you think the extra allocation is a meaningful problem? When ess=false is passed the function allocates a singleton, so it takes ~no time or memory.

It's not about the extra allocation, it's about complexity of the code and interface. Your proposal introduces an additional keyword argument and allocates a vector that is essentially empty, and this seems unnecessary, when you can use either of the two options I suggested. The first would do neither of these things, while the second would export one additional function; in neither case is the vector allocated unless the user requests it.

That being said, I just realized that memoizing psis_ess should accomplish the same goal, so I suppose I can do that instead if you want.

There's no need to memoize. A user is unlikely to call psis_ess on the same Psis object twice, and this just adds code complexity and a new dependency.

What do you dislike about either of the options I suggested?

ParadaCarleton commented 2 years ago

Do you think the extra allocation is a meaningful problem? When ess=false is passed the function allocates a singleton, so it takes ~no time or memory.

It's not about the extra allocation, it's about complexity of the code and interface. Your proposal introduces an additional keyword argument and allocates a vector that is essentially empty, and this seems unnecessary, when you can use either of the two options I suggested. The first would do neither of these things, while the second would export one additional function; in neither case is the vector allocated unless the user requests it.

That being said, I just realized that memoizing psis_ess should accomplish the same goal, so I suppose I can do that instead if you want.

There's no need to memoize. A user is unlikely to call psis_ess on the same Psis object twice, and this just adds code complexity and a new dependency.

What do you dislike about either of the options I suggested?

I'd be very surprised if no users at all want to see the ESS twice, given the ESS displays whenever a Psis object is called in the REPL. I suppose I have three concerns:

  1. Wasted computation if the ESS is recalculated every time the user wants to take a look at the ESS values.
  2. Calculating the ESS lazily makes performance behavior very unintuitive. It's weird for running a function to be fast, but then have it be very slow to access a property of a struct, which is usually close to instant. It's especially weird for it to be very slow to read a display table in the REPL when computation is finished.
  3. I feel as though the implementation here is less complex rather than more, since the code is a bit shorter. An extra keyword argument adds almost no complexity relative to having to overload several methods.

If you have concerns about returning an empty array for psis_ess, we can make it so that trying to access these fields when they don't exist throws an error. I think 1 and 2 are obviously not a big deal for people who don't want to access the ESS, but most people should be reading those -- we want them to be checking diagnostics.