Generalized EM crashes or produces wrong answer

apfeffer commented 8 years ago

I am finding severe problems with generalized EM. Again, this is in 3.3 – please check if this is fixed in 4.0.

The following program using EM with VE runs correctly:

package exercises

import com.cra.figaro.patterns.learning. import com.cra.figaro.library.atomic.continuous.Beta import com.cra.figaro.patterns.learning.ParameterCollection import com.cra.figaro.language.Flip import com.cra.figaro.library.compound.If import com.cra.figaro.algorithm.learning. import com.cra.figaro.algorithm.factored.VariableElimination import com.cra.figaro.algorithm.sampling.ProposalScheme

object ex133 { val params = ModelParameters()

val xParam = Beta(1, 1)("x", params) val yGivenXParam = Beta(2, 1)("yGivenX", params) val yGivenNotXParam = Beta(1, 2)("yGivenNotX", params) val zGivenYParam = Beta(1, 1)("zGivenY", params) val zGivenNotYParam = Beta(1, 1)("zGivenNotY", params)

class Model(pc: ParameterCollection) { val x = Flip(pc.get("x")) val y = If(x, Flip(pc.get("yGivenX")), Flip(pc.get("yGivenNotX"))) val z = If(y, Flip(pc.get("zGivenY")), Flip(pc.get("zGivenNotY"))) }

for { i <- 1 to 10 } { val xz = scala.util.Random.nextBoolean() val model = new Model(params.priorParameters) model.x.observe(xz) model.z.observe(xz) }

val time0 = System.currentTimeMillis()
val learningAlg = EMWithMH(10, 1000, ProposalScheme.default, params) learningAlg.start() val time1 = System.currentTimeMillis()
println("Time: " + ((time1 - time0) / 1000.0))
val futureModel = new Model(params.posteriorParameters) futureModel.x.observe(true) println(VariableElimination.probability(futureModel.z, true))

def main(args: Array[String]) {

} }

However, importance sampling, MH, and BP all fail. MH and BP both crash with the exception below. Importance sampling takes an inordinately long time (about 500 times longer than one EM iteration) and returns the incorrect answer of 0.5, which is the prior probability. Does anyone have an explanation for this?

Exception in thread "main" java.lang.ExceptionInInitializerError at exercises.ex133.main(ex133.scala) Caused by: com.cra.figaro.algorithm.BaseProbQueryAlgorithm$NotATargetException at com.cra.figaro.algorithm.BaseProbQueryAlgorithm$class.check(ProbQueryAlgorithm.scala:101) at com.cra.figaro.algorithm.BaseProbQueryAlgorithm$class.distribution(ProbQueryAlgorithm.scala:113) at com.cra.figaro.algorithm.sampling.OneTimeMetropolisHastings.distribution(MetropolisHastings.scala:445) at com.cra.figaro.algorithm.learning.GeneralizedEM$$anonfun$doExpectationStep$1$$anonfun$apply$1.apply(GeneralizedEM.scala:167) at com.cra.figaro.algorithm.learning.GeneralizedEM$$anonfun$doExpectationStep$1$$anonfun$apply$1.apply(GeneralizedEM.scala:163) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at com.cra.figaro.algorithm.learning.GeneralizedEM$$anonfun$doExpectationStep$1.apply(GeneralizedEM.scala:163) at com.cra.figaro.algorithm.learning.GeneralizedEM$$anonfun$doExpectationStep$1.apply(GeneralizedEM.scala:160) at scala.collection.immutable.List.foreach(List.scala:381) at com.cra.figaro.algorithm.learning.GeneralizedEM.doExpectationStep(GeneralizedEM.scala:160) at com.cra.figaro.algorithm.learning.ExpectationMaximization$class.iteration(GeneralizedEM.scala:75) at com.cra.figaro.algorithm.learning.GeneralizedEM.iteration(GeneralizedEM.scala:148) at com.cra.figaro.algorithm.learning.ExpectationMaximization$class.em(GeneralizedEM.scala:64) at com.cra.figaro.algorithm.learning.GeneralizedEM.em(GeneralizedEM.scala:148) at com.cra.figaro.algorithm.learning.ExpectationMaximization$class.doStart(GeneralizedEM.scala:36) at com.cra.figaro.algorithm.learning.GeneralizedEM.doStart(GeneralizedEM.scala:148) at com.cra.figaro.algorithm.Algorithm$class.start(Algorithm.scala:83) at com.cra.figaro.algorithm.learning.GeneralizedEM.start(GeneralizedEM.scala:148) at exercises.ex133$.(ex133.scala:39) at exercises.ex133$.(ex133.scala) ... 1 more

mreposa commented 8 years ago

I have validated this exception is still thrown in the current version of Figaro 4.0.

bruttenberg commented 8 years ago

Ok, I took a look at this. There are several problems going on here:

With regard to MH, As @mhoward2718 pointed out, the problem is that the parameterized elements are declared within an If, which does call by value. Hence the parameterized elements are grabbed before the expectation step as targets in the expectation algorithm, but some parameterized elements have not been generated yet. There is no way around this, as we can never fully know if there are more parameterized elements until we run the model. The fix for this is to ignore the sufficient statistics for any parameterized element that we find during maximization that was not added as a target. There's not much you can do here. This fix works for MH.
Importance sampling takes forever because you have a ton of hard evidence and it's constantly rejecting.
This fix does not fix BP, and I'm not sure what the problem there is yet.

bruttenberg commented 8 years ago

Actually, I think the BP issue is the same problem, but since this is not a sampling algorithm it never generates the parameterized elements. I'm not 100% sure though.

Edit: Confirmed, same problem. When I make the parameterized elements permanent (ie, not inside the If) it passes. Still not sure how to fix this yet for BP.

bruttenberg commented 8 years ago

So, this is not fixable for BP. In generalized EM, we select the targets (parameterizable elements) from the active elements before the algorithm starts. Since, in this case, these elements live inside a chain, they do not exist yet. And since they don't exist, they are not added to the targets list of BP, and since BP expands outward from the target, nothing is expanded and no learning is performed.

Basically, this is almost a violation of Figaro semantics - you can't have the target of an algorithm be a temporary variable (this technically does not violate it because it's part of the If, which does call by name).

I see no way to fix this besides moving the elements out of the If.

apfeffer commented 8 years ago

Hmm. I think the problem is that we’re conflating the concept of a target (as in a variable we query) and an element that contributes to sufficient statistics. EM is implemented in such a way that we make these elements targets, but that’s not really correct. There’s no semantic reason not to use a parameter inside a chain – it’s not a violation of Figaro semantics. Rather, our insistence that such parameters be treated as targets is the violation.

I suggest we put a rethink of generalized EM on the agenda. In the meantime, perhaps we should remove EMWithBP until we can get this right.

+ + + + + + + + + + + + + Avi Pfeffer, PhD Principal Scientist Charles River Analytics 617.491.3474 x513 ww.cra.com

From: bruttenberg [mailto:notifications@github.com] Sent: Friday, February 5, 2016 2:52 PM To: p2t2/figaro figaro@noreply.github.com Cc: Avi Pfeffer apfeffer@cra.com Subject: Re: [figaro] Generalized EM crashes or produces wrong answer (#540)

So, this is not fixable for BP. In generalized EM, we select the targets (parameterizable elements) from the active elements before the algorithm starts. Since, in this case, these elements live inside a chain, they do not exist yet. And since they don't exist, they are not added to the targets list of BP, and since BP expands outward from the target, nothing is expanded and no learning is performed.

Basically, this is almost a violation of Figaro semantics - you can't have the target of an algorithm be a temporary variable (this technically does not violate it because it's part of the If, which does call by name).

I see no way to fix this besides moving the elements out of the If.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/540#issuecomment-180526096.

bruttenberg commented 8 years ago

Yes, I agree that treating the parameters as targets is the problem here. For the sampling algorithms, this works because on successive iterations you are likely to generate the "targets", though could still be a problem if your targets have very small priors.

I'm not sure removing EMWithBP completely is a good idea, since it still works fine for most things. But maybe we should discourage its use for the time being?

mhoward2718 commented 8 years ago

Well, VE works because we made a special factor type for tracking sufficient statistics. We don't have a version of BP that uses those factors currently, though. Maybe creating one is the solution, and probably easier after the factor unification Brian did.

mhoward2718 commented 8 years ago

By the way, we also have issue #543, which is to create a trait for Marginal MAP algorithms. Alex's presentations and paper mention that EM is a marginal MAP algorithm. Maybe when we define this trait, it will help us find a better way of specifying parameters//MAP target variables

gtakata commented 8 years ago

You have pretty much seen the marginal map interface. Except for the .map file it looks like the other solvers, with a .uai and an .evi file. There is nothing special in the parameters for EM.

From: mhoward2718 [mailto:notifications@github.com] Sent: Monday, February 8, 2016 1:13 PM To: p2t2/figaro figaro@noreply.github.com Subject: Re: [figaro] Generalized EM crashes or produces wrong answer (#540)

By the way, we also have issue #543https://github.com/p2t2/figaro/issues/543, which is to create a trait for Marginal MAP algorithms. Alex's presentations and paper mention that EM is a marginal MAP algorithm. Maybe when we define this trait, it will help us find a better way of specifying parameters//MAP target variables

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/540#issuecomment-181442683.

apfeffer commented 8 years ago

I wonder if it’s possible to use sufficient statistics factors in sampling algorithms over factors. Put differently, can we make all our factored algorithms work with any Semiring? If so, we could make EM work in a SFI framework.

+ + + + + + + + + + + + + Avi Pfeffer, PhD Principal Scientist Charles River Analytics 617.491.3474 x513 ww.cra.com

From: mhoward2718 [mailto:notifications@github.com] Sent: Monday, February 8, 2016 12:53 PM To: p2t2/figaro figaro@noreply.github.com Cc: Avi Pfeffer apfeffer@cra.com Subject: Re: [figaro] Generalized EM crashes or produces wrong answer (#540)

Well, VE works because we made a special factor type for tracking sufficient statistics. We don't have a version of BP that uses those factors currently, though. Maybe creating one is the solution, and probably easier after the factor unification Brian did.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/540#issuecomment-181438787.

gtakata commented 8 years ago

It should be possible since factor operations are all in terms of sum and product (however defined). If we can delegate these to the semiring and insure that only the semiring handles them then using a Semiring trait with suitable class instantiations should work. I may be misremembering but it seemed that, at one time, semirings also carried supplemental information, eg semirings for decisions, that were used differently. This might be problematic.

From: apfeffer [mailto:notifications@github.com] Sent: Monday, February 8, 2016 3:10 PM To: p2t2/figaro figaro@noreply.github.com Cc: Glenn Takata gtakata@cra.com Subject: Re: [figaro] Generalized EM crashes or produces wrong answer (#540)

I wonder if it’s possible to use sufficient statistics factors in sampling algorithms over factors. Put differently, can we make all our factored algorithms work with any Semiring? If so, we could make EM work in a SFI framework.

+ + + + + + + + + + + + + Avi Pfeffer, PhD Principal Scientist Charles River Analytics 617.491.3474 x513 ww.cra.com

From: mhoward2718 [mailto:notifications@github.com] Sent: Monday, February 8, 2016 12:53 PM To: p2t2/figaro figaro@noreply.github.com<mailto:figaro@noreply.github.com> Cc: Avi Pfeffer apfeffer@cra.com<mailto:apfeffer@cra.com> Subject: Re: [figaro] Generalized EM crashes or produces wrong answer (#540)

Well, VE works because we made a special factor type for tracking sufficient statistics. We don't have a version of BP that uses those factors currently, though. Maybe creating one is the solution, and probably easier after the factor unification Brian did.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/540#issuecomment-181438787.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/540#issuecomment-181510247.

bruttenberg commented 8 years ago

Added a warning when running BP and EM

charles-river-analytics / figaro

Generalized EM crashes or produces wrong answer #540