femtomc / Jaynes.jl

E.T. Jaynes home phone.
https://femtomc.github.io/Jaynes.jl
Apache License 2.0
45 stars 1 forks source link

Scope setting - open directions #52

Open femtomc opened 4 years ago

femtomc commented 4 years ago

There are 3 large scale directions which I want to commit to writing, all of which are relevant to trace-based PPLs in general, and not just the dynamo architecture which Jaynes uses to implement the GFI.

  1. Construct an advanced argdiffs system for dynamic tracing.

This requires a considerable amount of thought and work. The best place to begin with this would be consider how one could replicate Gen's static DSL codegen as part of the dynamic language. From discussions with @phipsgabler and @georgematheos, one way to do this would be to create a dynamic computation graph which performs caching, representations of calls, etc at the IR level. Now the tracer can rely on the computation graph when update operations don't modify structure - but when structure is modified, a combination of static and dynamic information prevents fallback to pure dynamic tracing.

The real issue with this is that there's likely a large amount of overhead for codegen if structure changes often. In the static DSL, you have a static guarantee that structure doesn't change - hence you are able to construct specialized choice map representations which are fixed no matter the dynamic path because you know the set of addresses statically. In contrast, a dynamic computation graph is always an under-approximation of the exact call graph (or randomness flow graph) of the program. One thing which would really be interesting is to see if you could combine a static IR representation with the dynamic computation graph, so (when structure changes) you know exactly where it changes. When performing an update, if you just had the dynamic computation graph - you can't tell where a change propagates to. So the system might consist of two parts: a robust dynamic computation graph constructor, and a way to combine information provided by a static SSA IR represent of a call with the dynamic computation graph.

This would be a large scale project for both Jaynes' current style of dynamo tracing and Gen (and likely even useful to Turing!) - it makes sense to discuss a combined effort here. Would this be an extension of IRTracker? Would this be a general re-design, including the construction of a "probabilistic IR" as @phipsgabler has discussed?

  1. Integrate autodiff support with Keno's new AD

This should be easy, when the time comes to do it. Currently I support forward and backward AD (with backward AD provided by Zygote). In the future, I will switch backward AD to the new AD system.

  1. Construct interop interfaces with Pyro.

This is something which should also be very easy, given the focus on the "call site abstraction" in Jaynes. It already provided (in the past, currently broken) easy interfaces to Gen and Soss. I can work through Pyrox.jl and represent Pyro call sites naturally.


One other direction is to explore what a version of the involution DSL looks like in the Jaynes architecture. Honestly, I don't really think this is necessary, given that it's very easy to support full interop with Gen (currently broken, unfortunately, after I changed a bunch of core data structure representations). Instead of committing to a large re-design of something which is already a nicely crafted piece of software, it makes more sense to ensure that everything is compatible with everything else - in an easy to use way.

phipsgabler commented 4 years ago

As to point 1: this sounds somewhat like what Venture does with those scaffolds, as far as I understand, with the difference that you single out the static graph as "base case" relative to which all others are analyzed -- is that somewhat right?

What I propose with the "probabilistic IR" kind of turns around the way these things are constructed right now in all the Julia approaches I've seen. Instead of starting from your sampling function/generative function/model, which is evaluated to get out graphs from it, you start from a representation of the model that already is "graphical", and derive evaluators from it. And if that representation looks like Julia IR, it doesn't matter whether the model is dynamic -- you always work on a fixed, full program.

I think this is feasible also from the end-user perspective, because there is a difference between AD and PPL-DSLs: you can't expect every writer of a mathematical function to anticipate it being used in AD code by marking it as @differentiable or whatever. But you can expect that from the writer of a probabilistic model, because non-standard evaluation in some form is inherently part of the whole probabilistic programming approach.

My perspective is the following: I'd like to have the model given in some form that I can analyze statically, but in a complete way (the things I'm thinking about involve derivation of Gibbs conditionals, detecting conjugacies, etc.).

This idea might be orthogonal to your re-evaluation/argdiff problems, or it might not -- I don't yet understand them well enough, as I have never done anything in that area.

femtomc commented 4 years ago

What I propose with the "probabilistic IR" kind of turns around the way these things are constructed right now in all the Julia approaches I've seen. Instead of starting from your sampling function/generative function/model, which is evaluated to get out graphs from it, you start from a representation of the model that already is "graphical", and derive evaluators from it. And if that representation looks like Julia IR, it doesn't matter whether the model is dynamic -- you always work on a fixed, full program.

I think this closely matches what I had in mind - only that unrestricted codegen from the IR seems expensive? One relevant piece of information (how this works in Gen’s static DSL). I would have to think more about this. I think this is basically what you are suggesting right? You always work from the IR as reference, so if you were doing inference in the Gen style - you’d synthesis the GFI methods, if you were doing in some other style, you’d synthesize other interfaces , etc as the primary form of codegen.

In reference to the link I posted, I’m imagining that this “compilation by triple” is performed from the IR representation you’re describing. One thing which is already interesting is the possibility of constructing exact (with possibly heap allocated structures for control flow which you can’t unroll) choice maps given the IR. This would already be “more specialized” than the current representations.

On your second point, it makes a lot of sense to augment the IR representation so that the analysis that you are interested in is easy to achieve.

phipsgabler commented 4 years ago

Pretty much, yes. As far as I can tell that sums in up in Gen terminology.

The "unknown choice map" issue is also one of the main points I have been thinking about. Interpreting the sampling statements as assignments to a dictionary, as it is done now in both Turing and Gen in some way, is always a trivial fallback possiblity, but there must exist something better. Cf. also my question here.