`evaluate!!` shenanigans

penelopeysm commented 5 days ago

`evaluate!!` as it stands

Our current implementation of evaluate!! is as follows.

https://github.com/TuringLang/DynamicPPL.jl/blob/ba490bf362653e1aaefe298364fe3379b60660d3/src/model.jl#L890-L935

Here's a little exercise for anyone reading this code. There are 4 arguments which are seemingly optional, rng, varinfo, sampler, and context, hence there are 2^4 = 16 possible invocations of evaluate!!.

Figure out: (1) which (if any) of them error, and (2) which of them wrap the context in a SamplingContext (line 907).

Answers:

r v s c
✓ ✓ ✓ ✓ Success - wrapped
✓ ✓ ✓ ✗ Success - wrapped
✓ ✓ ✗ ✓ Error
✓ ✓ ✗ ✗ Success - wrapped
✓ ✗ ✓ ✓ Success - wrapped
✓ ✗ ✓ ✗ Success - wrapped
✓ ✗ ✗ ✓ Success - wrapped
✓ ✗ ✗ ✗ Success - wrapped
✗ ✓ ✓ ✓ Success - wrapped
✗ ✓ ✓ ✗ Success - wrapped
✗ ✓ ✗ ✓ Success
✗ ✓ ✗ ✗ Success - wrapped
✗ ✗ ✓ ✓ Success - wrapped
✗ ✗ ✓ ✗ Success - wrapped
✗ ✗ ✗ ✓ Success
✗ ✗ ✗ ✗ Success - wrapped

If the table above seems to suggest that "most of the time it's wrapped", this is not really true in practice: the vast majority of calls to evaluate!! in the DynamicPPL repository are of the form evaluate!!(model, varinfo, context), which is one of the non-wrapped cases.

As you may guess, I'm not a fan of over-using multiple dispatch for this purpose!

It is very difficult to reason about (as the exercise above should have shown), it can easily lead to bugs (https://github.com/TuringLang/DynamicPPL.jl/pull/629) or method ambiguities (https://github.com/TuringLang/DynamicPPL.jl/pull/636), and the structure of the code does not clearly communicate what we're really trying to say (which is that 'the arguments are optional').

Keyword arguments

In many other programming languages, optional arguments are handled using keyword arguments that have default values. The benefits of this are:

only one method needs to be explicitly defined in the code, making it much easier to reason about;
the role of each argument is more visible at the point of call (compare f(1, 2) with f(; add=1, multiply=2));
the function signature makes it obvious to anybody reading the code that the arguments are optional;
the caller does not need to remember the order of specifying the arguments (which helps to address another easy source of bugs; compare f(2, 1) with f(; multiply=2, add=1)).

Unfortunately, I dabbled with turning evaluate!! into a function that used keyword arguments (see https://github.com/TuringLang/DynamicPPL.jl/blob/py/evaluate/src/model.jl#L904-L927), and this led to a noticeable performance hit:

Benchmarks

Setup: ```julia using Chairmarks using Distributions using DynamicPPL using Random @model f() = x ~ Normal() model = f() rng = Xoshiro(128) vi = VarInfo(model) spl = SampleFromPrior() ctx = DefaultContext() ``` master: ``` julia> @be DynamicPPL.evaluate!!(model) evals=1 Benchmark: 21487 samples with 1 evaluation min 791.000 ns (56 allocs: 3.719 KiB) median 917.000 ns (56 allocs: 3.719 KiB) mean 3.967 μs (56 allocs: 3.719 KiB, 0.01% gc time) max 52.700 ms (56 allocs: 3.719 KiB, 99.94% gc time) julia> @be DynamicPPL.evaluate!!(model, rng) evals=1 Benchmark: 54278 samples with 1 evaluation min 917.000 ns (57 allocs: 3.734 KiB) median 1.083 μs (57 allocs: 3.734 KiB) mean 1.183 μs (57 allocs: 3.734 KiB, <0.01% gc time) max 1.028 ms (57 allocs: 3.734 KiB, 97.67% gc time) julia> @be DynamicPPL.evaluate!!(model, vi, ctx) evals=1 Benchmark: 118298 samples with 1 evaluation min 208.000 ns (11 allocs: 480 bytes) median 292.000 ns (11 allocs: 480 bytes) mean 288.398 ns (11 allocs: 480 bytes) max 18.958 μs (11 allocs: 480 bytes) julia> @be DynamicPPL.evaluate!!(model, rng, vi, spl, ctx) evals=1 Benchmark: 115930 samples with 1 evaluation min 208.000 ns (11 allocs: 480 bytes) median 333.000 ns (11 allocs: 480 bytes) mean 318.844 ns (11 allocs: 480 bytes) max 20.917 μs (11 allocs: 480 bytes) ``` This PR: ``` julia> @be DynamicPPL.new_evaluate!!(model; wrap=true) evals=1 Benchmark: 21543 samples with 1 evaluation min 791.000 ns (56 allocs: 3.719 KiB) median 917.000 ns (56 allocs: 3.719 KiB) mean 3.888 μs (56 allocs: 3.719 KiB, 0.01% gc time) max 52.684 ms (56 allocs: 3.719 KiB, 99.94% gc time) julia> @be DynamicPPL.new_evaluate!!(model; rng=rng, wrap=true) evals=1 Benchmark: 60174 samples with 1 evaluation min 875.000 ns (59 allocs: 3.797 KiB) median 1.041 μs (59 allocs: 3.797 KiB) mean 1.269 μs (59 allocs: 3.797 KiB, 1.08% gc time) max 4.065 ms (59 allocs: 3.797 KiB, 64356.64% gc time) julia> @be DynamicPPL.new_evaluate!!(model; varinfo=vi, context=ctx) evals=1 Benchmark: 113010 samples with 1 evaluation min 250.000 ns (13 allocs: 672 bytes) median 333.000 ns (13 allocs: 672 bytes) mean 331.330 ns (13 allocs: 672 bytes, 8.32% gc time) max 16.792 μs (13 allocs: 672 bytes, 939777.60% gc time) julia> @be DynamicPPL.new_evaluate!!(model; rng=rng, varinfo=vi, sampler=spl, context=ctx, wrap=true) evals=1 Benchmark: 113324 samples with 1 evaluation min 334.000 ns (14 allocs: 720 bytes) median 458.000 ns (14 allocs: 720 bytes) mean 495.560 ns (14 allocs: 720 bytes, <0.01% gc time) max 3.929 ms (14 allocs: 720 bytes, 99.05% gc time) ```

I don't fully know the reasons for this, but possible explanations are overheads arising from the handling of keyword arguments (https://docs.julialang.org/en/v1/devdocs/functions/#Keyword-arguments) and also potential type instability, as kwcall generates a local variable that is a Union of all the keyword arguments. You can see the type instability e.g. with

g(x::Int; y::Int=2, z::Float64=3.0, w::String="Hello, world", t::AbstractArray{<:Number}=Float64[]) = x + y + z + length(w) + sum(t)
@code_warntype g(1; y=3, z=4.0)

Where to go from here?

Firstly, at the very least, I reckon the wrapping in SamplingContext should really be done at the call site, not inside evaluate!! itself. This shouldn't be too hard to fix, it's a matter of identifying invocations of evaluate!! according to the above table. This would be a nice quick win.

As for the method itself: one possibility is to make no changes. As someone developing and maintaining this codebase, dealing with dispatch rabbit holes like this one is a pretty frustrating time sink, so I'm really not keen to keep it this way. (Imagine how it's like for someone completely new, too. If we want to encourage open-source contributions, things like these could be quite discouraging.)

The other possibility that I can see is to just restrict the number of possible invocations. For example, I think keeping the three following methods would preserve almost all of the required behaviour. Some invocations that do not follow the patterns below will have to be modified, e.g. by specifying additional arguments which used to be defaults.

evaluate!!(model)
evaluate!!(model, varinfo, context)
evaluate!!(model, rng, varinfo, sampler, context)

But maybe there's some other pathway that I don't know of, which both preserves flexibility and performance while improving readability?

AbstractPPL and interfaces

evaluate!! is really defined in AbstractPPL, but AbstractPPL doesn't specify an actual interface for evaluate!! (e.g. a function signature).

Personally, I don't fully understand the purpose of an empty interface. It tells you that you can call some function evaluate!! on a model, but it doesn't tell you how to call it (i.e. what arguments to pass it). It's also impossible to test for correctness of the interface because there is no specification.

The result of having such an under-specified interface is that packages that depend on the interface are free to implement evaluate!! however they wish, and the interface becomes meaningless. This is partly proven by the fact that I was able to swap out evaluate!! for one with keyword arguments without causing any upstream breakage. The general problem of having such unenforcable interfaces is described in e.g. https://invenia.github.io/blog/2020/11/06/interfacetesting/.

This is probably a broader question but I think we need to evaluate (haha) why we are putting these zero-specification interfaces in AbstractPPL. If they're meant to have more specification, then they should be documented with signatures, and an interface testing function added. If not, then my current opinion is that it makes more sense to just keep the definition inside DynamicPPL.

Even if there's some other modelling package out there that uses AbstractPPL.evaluate!!, the fact is that they can implement it with whatever signature they wish, which means that there is nothing in common between our and their implementations of it (apart from the name, which as explained above carries no meaning in practice).

yebai commented 5 days ago

AbstractPPL was created relatively late by @phipsgabler and me as an attempt to standardise the interface for working with PPL models.

In reality, I didn't manage to direct enough team-wide attention towards AbstractPPL. As a result, DynamicPPL evolved on a per-demand basis without close sync with AbstractPPL. In retrospect, there are pros and cons to this approach towards DynamicPPL, but it is not intentional but mainly due to a lack of enough resources for a long time.

I appreciate that this answer doesn't provide a solution, but I want to explain the historical context of the above situation that @penelopeysm correctly identifies.

torfjelde commented 4 days ago

As @yebai answered the AbstractPPL.jl question, I'll just address potential simplifiications for evaluate!!.

In the past we always passed the rng and the sampler directly into the model (this was before contexts where introduced). Once contexts where introduced we wanted to have a clear separation between when we're sampling and when we're not. As a result, we put rng and sampler into the SamplingContext but we also wanted to maintain the current dispatches, hence why we have so many.

I'd be happy to remove the default arguments for evaluate!!. It's an internal method that we only really use when we have a varinfo and context anyways.

The only user-facing thing we have to worry about here is that we want stuff like

model(rng)

to work as intended, and so we'd need to put this "conversion" into the arguments going into the leaf-call of evaluate!! 😕 A bit uncertain what the best way of doing this would be though. For example, the suggestion of 3 dispatches you make won't cover this use-case (it would require one specifically for rng too), but maybe if we add that one, it'll be okay?

Overall I very much agree with you that this is all very messy 😕

yebai commented 4 days ago

I think we could also push __varInfo__ into __context__, so we have:

evaluate!!(__model__, __context__)

as an evaluation interface. I'm not sure how much work this might incur, but it seems sensible to me.

TuringLang / DynamicPPL.jl