Soss compatibility - Githubissues

phipsgabler commented 3 years ago

I'm opening this for discussion of what would need to be taken into account for reuse in both Turing and Soss.

For example, how does what I have called "subsumption" work in Soss -- the unification of indexed variables?

cscherrer commented 3 years ago

Thanks @phipsgabler , can you give more detail on what you mean by this, or maybe a simple example where it comes up?

phipsgabler commented 3 years ago

Say you have something like

x ~ Something()
for i in eachindex(x, y)
    y[i] ~ Normal(beta * x[i])
end

Then the current VarInfo contains a single entry for x, and a separate one for each y[i]. But when you do symbolic manipulation, you'd often instead deal with "sub-entries", like the individual x[i], or the "super-entry" y.

This is something I had to fight with all the time in my work on automatic derivation of Gibbs conditionals.

Is Soss just avoiding this because it is not necessary when you control flow replaced by combinators, and disallow mutation?

devmotion commented 3 years ago

Then the current VarInfo contains a single entry for x, and a separate one for each y[i].

That's not completely correct, it depends on if the variables are saved in a dictionary (always the case in the first run of a model) or in a named tuple (in subsequent runs, to improve type inference and stability). When the named tuple version is created (https://github.com/TuringLang/DynamicPPL.jl/blob/9d4280cf4b0a31e812c87acaad8a27d57b84db0e/src/varinfo.jl#L507-L559), then all y[i] are put together in one y variable.

The major annoyance is that currently everything is linearized, hence in addition VarInfo keeps track of indices of the variables and ranges of the vectors they correspond to.

phipsgabler commented 3 years ago

I was speaking sloppily, you're right of course. My primary issue is that AFAIK the indexing by VarName is not as flexible as it should be regarding "sub- and super-entries". Or at least the interface for that is not clean enough.

cscherrer commented 3 years ago

This hasn't come up in Soss, but a few related things have:

Transforming variables (say, for HMC) requires knowing all of the types. But Distributions aren't parameterized by these, so I have to fake it. Currently I run rand on everything once and use that to get the types, but it's very much a hack. This is one big reason I want type information in MeasureTheory.jl. I've also been considered using FunctionWrappers.jl so I can really have the types all the way through, but I need to look into the implications of that some more first.
I had been using SymPy's AST for symbolic manipulations (though I wrote the manipulations myself in Julia). SymPy had a lot of trouble with tensors, so this was kind of a mess.

I'm still not really clear on the difficulty, but I wouldn't think the combinators thing has much to do with it. It reminds me a little of problems with shapes I always hear the PyMC3 people talking about, could it be related to that?

Anyway, combinators aren't really a central part of Soss. It's more about the static modeling and code generation. For and iid were a good place to start because they're easy to reason about, so that lets me focus on other things for a while. I'd like to make the language more general at some point, though it will always be generative, so y is not an argument.

devmotion commented 3 years ago

Transforming variables (say, for HMC) requires knowing all of the types. But Distributions aren't parameterized by these, so I have to fake it. Currently I run rand on everything once and use that to get the types, but it's very much a hack. This is one big reason I want type information in MeasureTheory.jl. I've also been considered using FunctionWrappers.jl so I can really have the types all the way through, but I need to look into the implications of that some more first.

This leads a bit away from the initial question, but I just wanted to share my experiences here. In my opinion, often the best solution is to not compute output types at all - in many cases it requires heuristics, custom traits, and it will break eventually. Instead, the more reliable solution is to work with values and just compute stuff; the results can then be used to create caches or allocate containers dynamically.

For instance, we switched the main design of AbstractMCMC to immutable samplers with separate states (it is still possible to mutate samplers, which you had to do before, but it is not encouraged anymore) and this allowed us to remove all weird heuristics in Turing for allocating the internal HMC caches before sampling is started. Instead now we just compute everything that is needed in the first sampling step.

In my opinion, other approaches, such as heuristics, constraining types, or adding output types as type parameters, also impact the generality and dynamic nature of Julia unnecessarily. Additionally, it can be quite difficult to know the return types of functions (or samples) in advance for users, they might want to use custom number types for some parameters etc. but would like to not have to reason about the exact type of the samples when they are mainly interested in sampling. More concretely, instead of reasoning in advance about the output type of m + s * randn(rng, ::Type) for parameters m and s of custom types it is often sufficient to compute these samples.

cscherrer commented 3 years ago

Say you have a prior that includes mu ~ Normal(). You need to know what type of values to use for mu, which must depend on some function foo(::Normal). I'm not sure what "just compute stuff" could mean in this context, since the value used for mu has to come from somewhere.

foo = typeof ∘ rand is one possibility, but you really could use anything <: Real. Well, that's for Distributions, really Symbolics should work as well, another reason for MeasureTheory.jl :)

So I'd usually agree with you that it's good to avoid constraining output types in Julia (I think that's the point you're making here). But for many (most?) samplers, samples from the posterior are input types, not output. Or maybe I'm missing something?

And I'm all for immutability, but immutable arrays are tricky in Julia (large ones, anyway), and allocations kill performance. A typical way to solve this is for the programming model to give an immutable interface, and the compiler (or PPL, in our case) transforms to mutating code. That's the direction I was thinking of taking things with Soss. But maybe you've found a way to get good performance with immutable arrays?

devmotion commented 3 years ago

I'm not sure what "just compute stuff" could mean in this context, since the value used for mu has to come from somewhere.

It would be implied by the default mean and standard deviation parameters of Normal (probably just 0 and 1) and the default type of random numbers generated by randn (which is Float64). So probably the default would be Float64.

So I'd usually agree with you that it's good to avoid constraining output types in Julia (I think that's the point you're making here). But for many (most?) samplers, samples from the posterior are input types, not output. Or maybe I'm missing something?

I'd say, usually they are the output of some algorithm, i.e., some (possibly complex) function of the prior and the likelihood function (and some algorithm-specific hyperparameters).

Our discussion inspired me to the following improvement of EllipticalSliceSampling that illustrates what I had in mind: https://github.com/TuringLang/EllipticalSliceSampling.jl/pull/13/ The current implementation added some internal functions to figure out the type of the samples from the prior, that were defined for Distribution and could be extended by users. This was needed since some cache for sampling was constructed as part of a model struct. With the PR, however, this cache is constructed in the first sampling step after the first sample is obtained - thus users only have to define rand (and rand! for mutable samples) and everything will work automatically.

And I'm all for immutability, but immutable arrays are tricky in Julia (large ones, anyway), and allocations kill performance. A typical way to solve this is for the programming model to give an immutable interface, and the compiler (or PPL, in our case) transforms to mutating code. That's the direction I was thinking of taking things with Soss. But maybe you've found a way to get good performance with immutable arrays?

You don't need immutable arrays in this context. You just need the first element to be able to create out = Vector{typeof(firstelement)}(undef, n) without any type heuristics.

cscherrer commented 3 years ago

Thanks @devmotion , this all makes sense, and it very close to what I'm currently doing. But it does seem to overly constrain the types, right? Maybe a distribution defaults to Float64 but you want it to use Float32 or even Float16, say for GPU computation. Or for symbolic simplification of the logpdf, that sort of thing.

devmotion commented 3 years ago

Maybe a distribution defaults to Float64 but you want it to use Float32 or even Float16, say for GPU computation.

I thought about this, since the GPU question came up in DistributionsAD a while ago (and was addressed for multivariate normal distributions by basing the random number types on the parameters). However, I'm not really satisfied with this approach. I think it would be more natural to use a signature similar to the Base.rand such as rand(rng, ::Type{T}, distribution) where T does not constrain the output types but just the types of the random numbers that are sampled by rng. Such an approach would separate both aspects, of the distribution-dependent parameters and of the distribution-independent random numbers, clearly and allow to adjust one without considering the other or having to adjust the distribution parameters. It seems this would solve the issue you mention if the default distribution parameters are of non-invasive types such as Int.

cscherrer commented 3 years ago

Yep, that's my thought too. Now, we don't want to limit ourselves to just a few distributions. It should be very easy to mix and match them. When we call rand on one of these, we'll need to generate code containing other rand calls. And those calls will need to have a type associated with them, as you suggest. Where does this type come from?

If these "composite distributions" were hand-coded, we'd think this through each time. But we want them to be easily built from a collection of constructors (again, MeasureTheory.jl). The easiest way I can see is to include the output type in the type signature. Otherwise even having a sensible default can become very difficult.

devmotion commented 3 years ago

You would just use the same random number type (be it Float32, Float64, or some other more special type) in all these calls. Then e.g. sampling from a normal distribution d whose mean is a RV whose law is some other distribution d_mean could be done similar to:

function rand(rng::AbstractRNG, ::Type{T}, d::typeof(d)) where T
    m = rand(rng, T, d.d_mean)
    return m + d.s * randn(rng, T)
end

So all the raw random numbers would be generated with the same number type, but the type of the resulting sample would be implied by the parameters of d_mean and d without having to reason about it in advance. Or maybe you had something else in mind when talking about mixing and matching distributions?

TuringLang / AbstractPPL.jl

Soss compatibility #2