Open femtomc opened 4 years ago
Core of the implementation:
This should provide the maximal benefit to dynamic programs - although it misses out on some of Gen’s static trace specialization.
This ticket is evolving - it seems the most performance gain will come from implementing incremental re-computation for the dynamic language.
The above recommendations may not be ideal (especially the caching, which is proving tough to control).
Investigating usage of IRTracker.jl here @phipsgabler.
This issue is also being tracked by the Turing
lang team for performance - https://github.com/TuringLang/IRTracker.jl/issues/28.
Thinking about it, the "partial evaluation"/Futamura" approach could solve a couple of issues. Like, also the one for tape compilation.
@phipsgabler On the tape compilation point, I was going to investigate how ReverseDiff
compiles their tapes first so I could sort of understand how it works in a situation I sort of understand :P
I didn't totally understand that issue - flattening and stripping off control flow just because I've already followed a particular branch? So that's just extra information.
One thing which is nice about the partial evaluation approach is that you can run it before you run the dynamic tape. So you can optimize a lot of things you might want to do with tape compilation before you even construct the tape in the first place.
Ongoing conversations: https://github.com/probcomp/Gen.jl/issues/275
Would be to nice to sketch up how to do this, and then catch the specialization, at
overdub
time. This would allow the system to gradually optimize itself during inference.