TuringLang / AbstractMCMC.jl

Abstract types and interfaces for Markov chain Monte Carlo methods
https://turinglang.org/AbstractMCMC.jl
MIT License
87 stars 18 forks source link

Provide standard API to get **one sample** in a "structured" way #65

Open theogf opened 3 years ago

theogf commented 3 years ago

Hey, This might be me wanting crazy things again but here is my proposal: Should we provide an abstract function (to overload), taking state, sample and model to return samples in a predefined way. Let me give an example. Given a Turing model

@model function mymodel(x, y)
    s ~ Gamma(1, 1)
    w ~ Normal(0, s)
    y ~ MvNormal(x .* w, 1)
end
x = rand(5)
y = rand(5)

there would be a function which given the current sample and model returns (s=0.354123, w=0.154658)

The motivation behind this is to make more "universal" callback functions (the callback mafia is here again). Right now defining a callback function needs to be tailored for each model. However having a to_named_tuple (which would be optional to implement), would allow to build many more generic functions built upon AbstractMCMC.

So I am pretty sure I am missing out why this is not making sense or that I am missing the bigger picture and I would be happy to hear why I am terribly wrong about this :)

PS: The NamedTuple is just a proposition, my proposal is more about having a convention about the output.

devmotion commented 3 years ago

IMO AbstractMCMC should not make assumptions on or enforce any specific structure of the outputs. One of the design principles is that the interface and the default implementations should work regardless if your algorithm works best with Arrays, NamedTuples, StructArrays, etc. So I think we want to avoid any restrictions or conventions on the type of the samples and states.

However, I think it would be useful to have (optional?) conventions on how the samples should behave. I am not sure though if a separate interface is needed or if developers could e.g. just use the well-established Tables.jl interface. IIRC some implementations in Turing and DynamicPPL assume that Base.keys and Base.values are implemented but of course the Tables.jl interface would be more advanced (and provides default implementations for converting to arrays of NamedTuples).

In general, if you implement a callback, it seems you could just enforce that the samples implement the Tables.jl interface and throw an error otherwise?

theogf commented 3 years ago

I am sorry I also meant that this would be optional, in the sense that you don't have to implement it but a whole new world comes to you if you do!

Using a convention would be great but having the guarantee that this convention is followed if a function exists is even better :)

I don't think Tables.jl is the right tool here cause I am really speaking about 1 sample here!

devmotion commented 3 years ago

One sample would just correspond to a single row of a table. Of course, one could just rely on keys and values for a single sample but AFAICT the Tables.jl interface provides even more structure (BTW in their documentation it is even stated that "This allows the custom row type to behave as close as possible to a builtin NamedTuple object." :wink:).

theogf commented 3 years ago

I hereby withdraw my objection!

cpfiffer commented 3 years ago

I'm 100% in favor, as long as the interface to do so is sufficiently general and unrestrictive.