TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.05k stars 219 forks source link

New Interface Design #55

Closed yebai closed 7 years ago

yebai commented 8 years ago

We have some undecided interface issues since the introduction of the Hamiltonian Monte Carlo (HMC) sampler. More specifically, we need an interface to tell the HMC sampler which variable it should handle. In addition, we also need a way to compose samplers (e.g. Gibbs sampling subsets of variables with different samplers). To solve these issues altogether, I propose the following new interface.

##  Model definition
gaussdemo() =  begin  
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  for i=1:length(x)
    x[i] ~ Normal(m, sqrt(s))
  end
  return(s, m, x)
end

# Standard sampler interface
res = sample(gaussdemo, data = Dict(:x=>[1 2]), HMC(..., :s, :m))

# Compositional inference interface
res = sample(gaussdemo, data = Dict(:x=>[1 2]), HMC(..., :m),  PG(..., :s))

# Sampler's default behavior is to sample all random variables
res = sample(gaussdemo, data = Dict(:x=>[1 2]), HMC(...))

# Tagging parameters for samplers like PMC, SMC2
res = sample(gaussdemo, data = Dict(:x=>[1 2]), PMC(..., params=[:m]))

# Running program without a sampler produces a draw from the prior
s, m = sample(gaussdemo)
s, m = gaussdemo() 

In summary, first, this design removed macros @assume, @observe and @predict. Instead, it introduces additional parameters to each sampler to distinguish different types of variables. Second, this design introduces an interface to compose samplers (e.g. HMC and PG).

yebai commented 7 years ago

Here is a summary of the naming scheme we've discussed. One simple solution is to use the following:

varname_idx1_idx2

Here varname is the name of a variable, e.g.x, m, or s; moreover idx1, idx2 etc are the indices for multiple dimensional variables. For scalar type variables, the name would simplify to varname.

There is one potential drawback with this naming scheme. To illustrate this, let's first start with an (arbitrary) example

gaussdemo() =  begin  
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  for i=1:length(x)
    x[i] ~ Normal(m, sqrt(s))
    w    ~ Normal(m, sqrt(s))  # definition 1 of var w
  end
  w  ~   InverseGamma(2, 3) # definition 2 of var w
  return(s, m, x)
end

In the above example, variable w is declared twice in difference places. Because Julia is a dynamic language, this is fine. However this makes it difficult for us to distinguish these two w's. In order to implement the compositional sampler interface and report results to the user, we need a consistent naming scheme. There are two possible solutions

  1. Detect such variable redefinitions and report an error. This explicitly requires the user to avoid using the same name for multiple different random variables. However, using arrays is still fine.

  2. Add a special notation (e.g. through calling gensym) to the naming scheme. More specifically, the variable name would look like varname_uniqueid_idx1_idx2. The difficulty, however, is to add meaningful notations so that the user can tell which is which by looking at these variable names.

EDIT (24 Jan 2017)

After some discussion, we've decided to pick option 2. The solution to clashed variable names is:

1) Rename variables internally (e.g. storing w with different names, e.g. w1 w2 in VarInfo) following Wingate (2011).

2) When assigning variables to different samplers, symbol w will select all variables with the same name w (e.g. w1 and w2 are both sampled by the same sampler).

EDIT (14 Mar 2017)

It's probably better to use the following updated internal variable naming scheme (extracted form issue #96, link):

xukai92 commented 7 years ago

Meeting summary:

  1. Agree to option 1 for handling duplicated names

Detect such variable redefinitions and report an error. This explicitly requires the user to avoid using the same name for multiple different random variables. However, using arrays is still fine.

Plan for next step:

translate mumtiple sampler passed

varInfo = sample(gaussdemo, data = Dict(:x=>[1 2]), varInfo, HMC(..., :m),  PG(..., :s))

to

varInfo = sample(gaussdemo, data = Dict(:x=>[1 2]), varInfo, HMC(..., :m))
varInfo = sample(gaussdemo, data = Dict(:x=>[1 2]), varInfo,  PG(..., :s))
xukai92 commented 7 years ago

Regarding the new naming scheme, will it be good if we just name them like below?

:(x[1])
:(x[1,1])
:((x[1])[1])

It looks more straightforward and can easily distinguish between array of array and matrix.

Update

This is done in https://github.com/yebai/Turing.jl/pull/70.

xukai92 commented 7 years ago

Regarding the new interface, I feel we still need the @model macro because for the case below

gaussdemo(data=Dict(:x=>Float64[1 2])) =  begin  
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  for i=1:length(x)
    x[i] ~ Normal(m, sqrt(s))
  end
  return(s, m, x)
end

We will need to add a statement in the beginning of the function saying

x = data[:x]

Otherwise the there will be no local x to use.

Alternatively we can ask user to define data globally, but i think it's not what we want

yebai commented 7 years ago

Regarding the new interface, I feel we still need the @model macro because for the case below

I agree. It makes sense to keep it.

Regarding the new naming scheme, will it be good if we just name them like below? :(x[1]) :(x[1,1]) :((x[1])[1])

This makes sense. One minor question, can we use :(x[1][1]) instead of :((x[1])[1])?

xukai92 commented 7 years ago

Yes sure, could use x[1][1]!

xukai92 commented 7 years ago

When we compose two samplers, e.g. HMC(...) and PG(...), is it true that we do Gibbs sampling by 1 HMC step, 1 PG step, iteratively?

yebai commented 7 years ago

When we compose two samplers, e.g. HMC(...) and PG(...), is it true that we do Gibbs sampling by 1 HMC step, 1 PG step, iteratively?

We can do that by default. It is also possible to add an optional parameter iter to allow the user to indicate the number of iterations inside a Gibbs scan:

res = sample(gaussdemo, data = Dict(:x=>[1 2]), HMC(..., iter = 1, :m),  PG(..., iter = 1, :s))
xukai92 commented 7 years ago

I nearly finish the Gibbs sampling feature, with few things unsure. One of them is about using HMC and PG together: when a HMC step is rejected, should I reject the whole Gibbs step, or just reject the HMC part?

I guess it will be better for me to make a PR first so that we can deal with existing issues together. I will make a PR this afternoon.

yebai commented 7 years ago

Some updates since PR #73. The new bits are:

@model gdemo(x) = begin   
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  for i=1:length(x)
    x[i] ~ Normal(m, sqrt(s)) # Note: we need to fetch x from gdemo(; data = Data(...)).
  end
  return(s, m, x)
end

x = [2.0, 3.0]
sampler = Gibbs(HMC(..., :m), PG(..., :s))
chn     = @sample(gdemo(x), sampler)

NOTE: updated sampling interface using macro (23 Feb, 2017), #82

xukai92 commented 7 years ago

Regarding the introduction of the new Data type, I'm thinking of doing this job along with unifying the dictionary (used by @predict) and VarInfo, because basically what these 3 data structure do is to relate a symbol (variable) to a value (uni/multi/mat/array). Does it make sense?

yebai commented 7 years ago

Close in favour of #104