Sampling for factored algorithms

bruttenberg commented 6 years ago

Working with Alex Ihler and Rina Dechter, we've come up with a new way to do sampling of continuous elements for factored algorithms. Previously we would take N samples from the prior of each continuous element, then propagate those samples though the model. Since there was no consistency of samples between elements, this would often result in factor explosion as sampled elements were combined together (e.g. a mixture of gaussians would have support k*N, where k is the number of gaussians).

We changed this procedure to always enforce that no more than N samples are propagated through the model. This means that, in the mixture cause, out of k*N samples, we squeeze it back down to N samples. We now do this by uniformly sampling from the support of the variable that needs pairing down. This is equivalent to particle BP under a uniform proposal.

We're only doing this if a continuous element is detected along the dependency path of an element. When there is a mix of continuous and discrete, we still do this, and its ok because we still retain the density factor for the discrete values (ie, if the discrete values are subsampling, they will be renormalized according the selected samples).

So, do we want to push this change back into the main branch? It seems like a much better way to do sampling than before.

@apfeffer

gtakata commented 6 years ago

I think the size grows as N*k which is faster than Nk. More importantly what do you mean by "the support of the variable that needs pairing down" and how do you "squeeze it back down"?

Does the procedure only work on Gaussians or any mixture of continuous variables?

bruttenberg commented 6 years ago

The size of the factor grows exponentially in k, yes, I was just saying the support of the variable grows linearly in k. This (in theory) works on any mixture of any continuous and discrete variables. I have not fully tested it though.

To you question about squeezing, say you have this code:

If(Flip(0.1), Normal(0, 1), Normal(10, 1))

The way we do it now, we'd take, say 10 samples from each Normal. The support of the If statement would then be 10*2 since the probability that any two samples from each normal are the same is ~0. Now, we fix the support of the If statement to 10 samples. Since there are 20 possible values from the two normals, we sub-sample those 20 values down to 10. This process repeats itself anytime the number of samples grows beyond 10 (in this example).

gtakata commented 6 years ago

Still not sure what you are doing. Are you:

1) Sampling 10 from N(0,1) and 10 from N(10, 1), then sampling 10 from the result? or 2) Sampling 1 from N(0,1) and 9 from N(10,1), because of the Flip?

The latter makes more intuitive sense, but I'm not certain either is correct.

apfeffer commented 6 years ago

Hi Brian,

I like this! If it’s tested (i.e. doesn’t degrade performance on any of our tests and examples), let’s incorporate it into the main branch.

One question: How will this interact with LSFI?

Avi

From: Brian Ruttenberg notifications@github.com Reply-To: p2t2/figaro reply@reply.github.com Date: Wednesday, December 6, 2017 at 1:06 PM To: p2t2/figaro figaro@noreply.github.com Cc: Avi Pfeffer apfeffer@cra.com, Mention mention@noreply.github.com Subject: [p2t2/figaro] Sampling for factored algorithms (#733)

Working with Alex Ihler and Rina Dechter, we've come up with a new way to do sampling of continuous elements for factored algorithms. Previously we would take N samples from the prior of each continuous element, then propagate those samples though the model. Since there was no consistency of samples between elements, this would often result in factor explosion as sampled elements were combined together (e.g. a mixture of gaussians would have support k*N, where k is the number of gaussians).

We changed this procedure to always enforce that no more than N samples are propagated through the model. This means that, in the mixture cause, out of k*N samples, we squeeze it back down to N samples. We now do this by uniformly sampling from the support of the variable that needs pairing down. This is equivalent to particle BP under a uniform proposal.

We're only doing this if a continuous element is detected along the dependency path of an element. When there is a mix of continuous and discrete, we still do this, and its ok because we still retain the density factor for the discrete values (ie, if the discrete values are subsampling, they will be renormalized according the selected samples).

So, do we want to push this change back into the main branch? It seems like a much better way to do sampling than before.

@apfefferhttps://github.com/apfeffer

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/733, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJkd0bQw9DMJAY7-fH9MahacvtcsYgRks5s9tengaJpZM4Q4Wee.

bruttenberg commented 6 years ago

Glenn: Let X be 10 samples from N(0,1) and Y be 10 samples from N(10,1). The support of the If is now 10 samples taken from X \union Y.

Avi: Good question on LSFI. I can't give you an answer right now, I have to look into it. For continuous elements themselves, this will have no effect as this change is only implemented at Chains.

gtakata commented 6 years ago

It still seems to me that this is ignoring the Flip(0.1). If the 20 are treated equally, aren’t you likely to get a sample from N(5, 2)? Whereas it should be closer to N(9,..) ?

Glenn Takata, PhD Sr Software Engineer Charles River Analytics 617.491.3474 x625 ww.cra.com

From: Brian Ruttenberg [mailto:notifications@github.com] Sent: Wednesday, December 6, 2017 2:22 PM To: p2t2/figaro figaro@noreply.github.com Cc: Glenn Takata gtakata@cra.com; Comment comment@noreply.github.com Subject: Re: [p2t2/figaro] Sampling for factored algorithms (#733)

Glenn: Let X be 10 samples from N(0,1) and Y be 10 samples from N(10,1). The support of the If is now 10 samples taken from X \union Y.

Avi: Good question on LSFI. I can't give you an answer right now, I have to look into it. For continuous elements themselves, this will have no effect as this change is only implemented at Chains.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/733#issuecomment-349746142, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJDDRKbwo6uD1H2yviuNQt66RdHFySeks5s9ulFgaJpZM4Q4Wee.

bruttenberg commented 6 years ago

No, you're not going to ignore the flip because you'll still make a factor for the flip, which will then weight the output samples accordingly (ie, samples that came from N(10,1) will have more weight)

bruttenberg commented 6 years ago

Note, I'm also adding this for Apply's as well. The point Alex made was that this way we have total control over the size of the model that is sampled. (Again, only if a continuous variable is detected)

wkretschmer commented 6 years ago

In your example, does the support of the Normals change? Or, do e.g. the 10 values originally sampled from Normal(0,1) get mapped onto the 10 values of the Chain by some sort of approximation? If it's the latter, you might vaguely recall that I once implemented something like this for Gibbs sampling (see here), but I never had the chance to thoroughly test it.

bruttenberg commented 6 years ago

No, the support of the normals does not change. They are sampled once. The values of the Chain are subsampled from the union of the two normals, so the output of the chain is always consistent with some of the samples.

charles-river-analytics / figaro

Sampling for factored algorithms #733