blei-lab / edward

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
http://edwardlib.org
Other
4.83k stars 760 forks source link

Time-series models in edward #211

Open fmpr opened 8 years ago

fmpr commented 8 years ago

Hi,

I'm interested in doing some time-series state-space modelling in edward. So I started implementing a simple autoregressive model AR(k) with a toy dataset. I started out with fixed values for the observation noise and for the variance in the latent process, and everything seemed to work great and fast (code for this example is available here: ARk_fixed_noises.py ).

However, if I place Inverse-Gamma priors on these two variances and estimate their distributions, then I need to increase the number of samples significantly in order to obtain any decent results for the inferred states, and then MFVI (naturally) becomes quite slow even for this very small toy example (code with InvGamma priors: ARk_estimate_obs_noise.py and ARk_estimate_both.py ).

So may questions are:

akucukelbir commented 8 years ago

hi @fmpr,

this is great! we've been meaning to add some time-series models to our examples and website. we would love to work with you on these.

a quick question before we begin to dig into your examples: does ADVI in Stan work well for the Inverse Gamma prior case? (my intuition says that it shouldn't work very well in that case.)

fmpr commented 8 years ago

Hi @akucukelbir,

Actually, in my Stan code I had a half-Cauchy prior on the variances, but I just changed that to a Inverse Gamma and it still works great. Once the Stan model is compiled, it converges in just a few seconds (for a dataset that is pretty much the same as the one that I'm trying in edward), and it gives good results.

Here is the Stan model and the R script that uses it, in case you want to have a look. I was pretty much trying to get something similar to this working in edward as a starting point... kf_simple_invgamma.stan kf_simple_invgamma.R

maedoc commented 7 years ago

@fmpr thanks for posting this code, exactly what I was looking for. What's your end goal? Just AR? I recently added a Euler-Maruyama scheme PyMC3 and will try to do something similar here, if possible. Would there be interest in contributing such support for SDEs in Edward?

dustinvtran commented 7 years ago

@fmpr: i think this works, no?

mu = Normal(mu=1.0, sigma=10.0)
beta = Normal(mu=1.0, sigma=10.0)

noise_proc = InverseGamma(alpha=1.0, beta=1.0)
noise_obs = InverseGamma(alpha=1.0, beta=1.0)

x = [0] * N
x[0] = Normal(mu=mu, sigma=10.0)  # fat prior on x
for n in range(1, N):
  x[n] = Normal(mu=mu + beta * x[n-1], sigma=noise_proc)

y = Normal(mu=x, sigma=noise_obs)

@maedoc: yes, SDEs would be awesome. would be very interesting to explore it especially in the realm of ABC, where the likelihood is intractable (due to a complex high-dimensional SDE).

rw commented 7 years ago

@dustinvtran, are there any updates on the native modeling language that can provide what ADVI needs?

dustinvtran commented 7 years ago

essentially we need an attribute to all random variables, with a default constrained-to-unconstrained transformation. we can then do ADVI by leveraging this attribute to perform automated transformations.

i thought tf.contrib.distributions implemented this, although now that i look i no longer see it(?).

following this comment by @jvdillon (https://github.com/tensorflow/tensorflow/issues/4965#issuecomment-266596499), it looks like it's close but not quite available yet.

nfoti commented 7 years ago

I would like to model time series in edward so I tried out a modified version of the code provided by @dustinvtran above. Specifically,

mu = 0.
beta_true = 0.9

noise_obs = 0.1
T = 64

x_true = np.random.randn(T)*noise_obs
for t in range(1, T):
    x_true[t] += beta_true*x_true[t-1]

mu = Normal(mu=0., sigma=10.0)
beta = Normal(mu=0., sigma=2.0)

noise_proc = tf.constant(0.1) #InverseGamma(alpha=1.0, beta=1.0)

x = [0] * T
x[0] = Normal(mu=mu, sigma=10.0)  # fat prior on x
for n in range(1, T):
  x[n] = Normal(mu=mu + beta * x[n-1], sigma=noise_proc)

qmu = PointMass(params=tf.Variable(0.))
qbeta = PointMass(params=tf.Variable(0.))
inference = ed.MAP({beta: qbeta, mu: qmu}, {x: x_true})

however, I get the error TypeError: unhashable type: 'list' which I assume comes from x being a list Normal variables.

Is there something obvious I'm just missing? This should just be a large hierarchical model, do I have to feed each x[i] separately?

dustinvtran commented 7 years ago

Since x is a list of random variables, I think you'd have to, e.g., data={xt: xt_true for xt, xt_true in zip(x, x_true)}.

We could think of ways to vectorize this. It's easy to vectorize across data points, so each x[t] consists of N data points, implying the list x comprises of N data points each of T steps. x[t]'s dependence on x[t-1] makes it a little difficult to vectorize across time steps.

nfoti commented 7 years ago

Thanks @dustinvtran. I'll try feeding in each time separately for now. This will be fine for univariate series, I'm wondering if it'll be able to scale to the series I'm interested in analyzing.

I agree that vectorizing over replicates of series is easy but the dependency makes it difficult. It may be that for higher dimensional series it'd be better to just make an explicit Distribution that handles the joint distribution of all the x[t]. I'll let you know how things work out with this solution in the mean time.

dustinvtran commented 7 years ago

Do let me know if that works! If building up a new random variable turns out to be necessary for very long time series, then we should definitely make note of it.

nfoti commented 7 years ago

The code you suggested generates the following error: TypeError: Data value has an invalid type.

It looks like the Inference constructor doesn't handle the case when the values being fed into a node is of type numpy.float (or even just float). I tried to make each x_true[t] a 1d numpy array but that generated a different error saying that the sizes were incompatible. I'm assuming this is because each x[t] is a Normal which is probably a scalar.

I'm happy to try to add the case of a missing float, just wanted to check that I wasn't either duplicating effort or that it would be part of a larger issue.

Thanks.

dustinvtran commented 7 years ago

nice catch—that would be a fine contribution. it is not duplicating effort anywhere.

nfoti commented 7 years ago

cool. I think I have it working. Will submit a PR shortly.