charles-river-analytics / figaro

Figaro Programming Language and Core Libraries
Other
757 stars 153 forks source link

Running figaro in parallel on single jvm fails #413

Open francisdb opened 9 years ago

francisdb commented 9 years ago

I'm evaluating figaro and it seems that the framework is not ready for running multiple algorithms concurrently. My guess is the globally static Factory.factorCache is causing problems

below is a stack trace of exceptions occurring in our parallel spes2 tests (forcing to run the tests sequentially fixes the issue) (I am using a new universe for each test)

[error]    key not found: List(0) (Factor.scala:95)
[error] com.cra.figaro.algorithm.factored.factors.Factor$class.get(Factor.scala:95)
[error] com.cra.figaro.algorithm.factored.factors.BasicFactor.get(BasicFactor.scala:26)
[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination$$anonfun$14.apply(VariableElimination.scala:257)
[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination$$anonfun$14.apply(VariableElimination.scala:257)
[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.computeDistribution(VariableElimination.scala:257)
[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.computeExpectation(VariableElimination.scala:267)
[error] com.cra.figaro.algorithm.ProbQueryAlgorithm$class.computeProbability(ProbQueryAlgorithm.scala:51)
[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.computeProbability(VariableElimination.scala:221)
[error] com.cra.figaro.algorithm.OneTimeProbQuery$class.doProbability(OneTimeProbQuery.scala:30)
[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.doProbability(VariableElimination.scala:221)
[error] com.cra.figaro.algorithm.ProbQueryAlgorithm$class.probability(ProbQueryAlgorithm.scala:128)
[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.probability(VariableElimination.scala:221)
bruttenberg commented 9 years ago

Hi Francis,

First, thanks for your interest in Figaro. We really appreciate you giving it a try.

Second, thanks for letting us know. We’ve actually just run into this issue already (issue #391) – it’s not when running in parallel but it’s the same type of problem. And you are correct, the issue is in the factor cache. When a new algorithm is run, it wipes the factor cache because the factor may have changed, but at the end of VE when you query the algorithm, it does need to retrieve the factors, and that can cause a mismatch. We haven’t gotten around to fixing it yet, but the most likely fix will be that each algorithm keeps a local copy of the factors it needs so that other algorithms are free to wipe the cache. We hope to have this fixed soon.

I’m not sure Belief Propagation will have this problem, so you can try that if you want.

Brian

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Thursday, March 26, 2015 5:37 PM To: p2t2/figaro Subject: [figaro] Running figaro in parallel on single jvm fails (#413)

I'm evaluating figaro and it seems that the framework is not ready for running multiple algorithms in parallel. My guess is the globally static Factory.factorCache is causing problems

below is a stack trace of exceptions occurring in our parallel spes2 tests (forcing to run the tests sequentially fixes the issue) (I am using a new universe for each test)

[error] key not found: List(0) (Factor.scala:95)

[error] com.cra.figaro.algorithm.factored.factors.Factor$class.get(Factor.scala:95)

[error] com.cra.figaro.algorithm.factored.factors.BasicFactor.get(BasicFactor.scala:26)

[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination$$anonfun$14.apply(VariableElimination.scala:257)

[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination$$anonfun$14.apply(VariableElimination.scala:257)

[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.computeDistribution(VariableElimination.scala:257)

[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.computeExpectation(VariableElimination.scala:267)

[error] com.cra.figaro.algorithm.ProbQueryAlgorithm$class.computeProbability(ProbQueryAlgorithm.scala:51)

[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.computeProbability(VariableElimination.scala:221)

[error] com.cra.figaro.algorithm.OneTimeProbQuery$class.doProbability(OneTimeProbQuery.scala:30)

[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.doProbability(VariableElimination.scala:221)

[error] com.cra.figaro.algorithm.ProbQueryAlgorithm$class.probability(ProbQueryAlgorithm.scala:128)

[error] com.cra.figaro.algorithm.factored.ProbQueryVariableElimination.probability(VariableElimination.scala:221)

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413.

francisdb commented 9 years ago

Hi @bruttenberg, thanks for looking into this. Is that global factor cache still needed once this is fixed? Not too fond of having a possibly ever-growing static map inside our server...

I'll have a go at the Belief Propagation algorithm

bruttenberg commented 9 years ago

Hi Francis,

I’m not sure if we will fix this immediately. The reason is that we have a major redesign of the factored algorithms coming down the pipeline in a month or so, which makes this problem moot (and won’t grow a static map either). Are you using the jar version of Figaro or the source from GitHub? If you’re using the source, I may be able to push a temporary fix.

Brian

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Friday, March 27, 2015 10:02 AM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

Hi @bruttenberghttps://github.com/bruttenberg, thanks for looking into this. Is that global factor cache still needed once this is fixed? Not too fond of having a possibly ever-growing static map inside our server...

I'll have a go at the Belief Propagation algorithm

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-86948172.

francisdb commented 9 years ago

I'm currently using the maven central artifacts. Just checked the BeliefPropagation and has similar issue:

java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:347)
    at scala.None$.get(Option.scala:345)
    at com.cra.figaro.algorithm.factored.beliefpropagation.ProbabilisticBeliefPropagation$class.getFinalFactorForElement(BeliefPropagation.scala:250)
    at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.getFinalFactorForElement(BeliefPropagation.scala:301)
    at com.cra.figaro.algorithm.factored.beliefpropagation.ProbabilisticBeliefPropagation$class.getBeliefsForElement(BeliefPropagation.scala:228)
    at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.getBeliefsForElement(BeliefPropagation.scala:301)
    at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.computeDistribution(BeliefPropagation.scala:337)
    at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.computeExpectation(BeliefPropagation.scala:340)
    at com.cra.figaro.algorithm.ProbQueryAlgorithm$class.computeProbability(ProbQueryAlgorithm.scala:51)
    at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.computeProbability(BeliefPropagation.scala:301)
    at com.cra.figaro.algorithm.AnytimeProbQuery$class.handle(AnytimeProbQuery.scala:61)
    at com.cra.figaro.algorithm.factored.beliefpropagation.BeliefPropagation$$anon$2.handle(BeliefPropagation.scala:426)
    at com.cra.figaro.algorithm.Anytime$Runner$$anonfun$active$1.applyOrElse(Anytime.scala:71)

On a side note, I arrived here by reading the "Practical Probabilistic Programming" book (eap).

francisdb commented 9 years ago

This is clearly a blocker for integrating figaro...

bruttenberg commented 9 years ago

Hi Francis,

I quickly made a fix for this issue. At the very least, it is a fix for the issue I am seeing. Since I don’t have your code, I can’t say with certainty that this will fix your problem, and there may be other problems with running in parallel that I am not aware of. If you can post a small version of your code that isolates the error, that could help.

In any case, I can push this change through GitHub, but it won’t make it to Maven till Figaro 3.1 release, which won’t be for a couple of weeks. If you have access to GitHub, you can certainly pull down the source from the development build (warning – Figaro 3.1 is still in development, so we can’t guarantee everything will work fine).

Brian

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Friday, March 27, 2015 4:00 PM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

I'm currently using the maven central artifacts. Just checked the BeliefPropagation and has similar issue:

ava.util.NoSuchElementException: None.get

at scala.None$.get(Option.scala:347)

at scala.None$.get(Option.scala:345)

at com.cra.figaro.algorithm.factored.beliefpropagation.ProbabilisticBeliefPropagation$class.getFinalFactorForElement(BeliefPropagation.scala:250)

at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.getFinalFactorForElement(BeliefPropagation.scala:301)

at com.cra.figaro.algorithm.factored.beliefpropagation.ProbabilisticBeliefPropagation$class.getBeliefsForElement(BeliefPropagation.scala:228)

at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.getBeliefsForElement(BeliefPropagation.scala:301)

at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.computeDistribution(BeliefPropagation.scala:337)

at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.computeExpectation(BeliefPropagation.scala:340)

at com.cra.figaro.algorithm.ProbQueryAlgorithm$class.computeProbability(ProbQueryAlgorithm.scala:51)

at com.cra.figaro.algorithm.factored.beliefpropagation.ProbQueryBeliefPropagation.computeProbability(BeliefPropagation.scala:301)

at com.cra.figaro.algorithm.AnytimeProbQuery$class.handle(AnytimeProbQuery.scala:61)

at com.cra.figaro.algorithm.factored.beliefpropagation.BeliefPropagation$$anon$2.handle(BeliefPropagation.scala:426)

at com.cra.figaro.algorithm.Anytime$Runner$$anonfun$active$1.applyOrElse(Anytime.scala:71)

On a side note, I arrived here by reading the "Practical Probabilistic Programming" book (eap).

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87072572.

francisdb commented 9 years ago

Do you have snapshot builds available at some maven repo?

bruttenberg commented 9 years ago

No we don’t unfortunately. We haven’t automated syncing out github development with Maven yet.

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Friday, March 27, 2015 4:52 PM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

Do you have snapshot builds available at some maven repo?

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87088317.

apfeffer commented 9 years ago

Brian,

Is this something where you could send Francis a patch?

From: bruttenberg [mailto:notifications@github.com] Sent: Friday, March 27, 2015 4:59 PM To: p2t2/figaro Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

No we don’t unfortunately. We haven’t automated syncing out github development with Maven yet.

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Friday, March 27, 2015 4:52 PM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

Do you have snapshot builds available at some maven repo?

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87088317.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87090398.

bruttenberg commented 9 years ago

I’m pretty sure. I can ask Mike to make a jar with the patch.

From: apfeffer [mailto:notifications@github.com] Sent: Friday, March 27, 2015 5:00 PM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

Brian,

Is this something where you could send Francis a patch?

From: bruttenberg [mailto:notifications@github.com] Sent: Friday, March 27, 2015 4:59 PM To: p2t2/figaro Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

No we don’t unfortunately. We haven’t automated syncing out github development with Maven yet.

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Friday, March 27, 2015 4:52 PM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

Do you have snapshot builds available at some maven repo?

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87088317.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87090398.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87090573.

francisdb commented 9 years ago

No need, I'm not that much in a hurry. Can always do a local build. I'll just wait for a final fix. Not messing with anything that is not in a maven/ivy repo. You should set up a travis or codeship build.

bruttenberg commented 9 years ago

Ok Francis. We’ve looked into setting up travis to do builds but haven’t had a chance to do it yet. I agree, it would be nice to have nightly builds posted to maven as snapshots.

In any case, the fix will be in Figaro 3.1, which we’re hoping to have out no later than mid April. The new factored algorithms that make this issue moot (and should improve in many other respects) will be sometime after that though, as we still work out the bugs. Sorry we couldn’t get it out to you sooner!

Please let us know if you have any other questions as well.

Brian

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Friday, March 27, 2015 5:16 PM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

No need, I'm not that much in a hurry. Can always do a local build. I'll just wait for a final fix. Not messing with anything that is not in a maven/ivy repo. You should set up a travis or codeship build.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87094468.

bruttenberg commented 9 years ago

Hi Francis,

I just pushed a fix to the 3.1 development branch. If you’re interested in trying it, let me know and we can send you a jar (or you can pull in from GitHub). If you don’t want mess around with Jars and just wait for the release, that’s fine, but just be aware that I can’t test your code, so there’s a chance this fix may not fix you issue (it does fix the original one I found).

Brian

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Friday, March 27, 2015 5:16 PM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

No need, I'm not that much in a hurry. Can always do a local build. I'll just wait for a final fix. Not messing with anything that is not in a maven/ivy repo. You should set up a travis or codeship build.

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87094468.

francisdb commented 9 years ago

I'll come up with a small test

francisdb commented 9 years ago

You might have to run it a few times to trigger exceptions You might see these cases:

Just play with the code, enabling and disabling some of the commented things, playing with the iterations, setting observations (I'm testing on a macbook pro, scala 2.11.6)

Specs2 test framework (runs both tests in parallel unless sequential is uncommented)

class FigaroSpec extends Specification{

  //sequential

"figaro" should {

    val runs = 100

    "do VariableElimination correctly in multithreaded context" in {
      //skipped
      import scala.concurrent.ExecutionContext.Implicits.global

      val futures = (1 to runs).map { n =>
        //println("ve" + n)
        Future {
          val universe = new Universe
          val select = Select(0.3 -> "foo", 0.7 -> "bar")("node1", universe)
          val target = universe.getElementByReference[String]("node1")
          val alg = VariableElimination(target)(universe)
          alg.start()
          val result = alg.probability(target, "foo")
          alg.kill()
          result
        }
      }

      val future = Future.sequence(futures)
      //println("ve waiting for futures")

      val result = Await.result(future, 10.seconds)

      result forall(v => v must beCloseTo(0.3 +/- 0.001))
    }

    "do BeliefPropagation correctly in multithreaded context with observation" in {
      //skipped

      import scala.concurrent.ExecutionContext.Implicits.global

      val futures = (1 to runs).map { n =>
        //println("bp" + n)
        Future {
          val universe = new Universe
          val select = Select(0.3 -> "foo", 0.7 -> "bar")("node1", universe)
          select.observe("foo")
          val target = universe.getElementByReference[String]("node1")
          val alg = /*VariableElimination(target)(universe)*/ BeliefPropagation(target)(universe)
          alg.start()
          val result = alg.probability(target, "foo")
          alg.kill()
          result
        }
      }

      val future = Future.sequence(futures)
      //println("bp waiting for futures")

      val result = Await.result(future, 10.seconds)

      result forall(v => v must beCloseTo(1.0 +/- 0.001))
    }
  }
bruttenberg commented 9 years ago

Hi Francis,

Thanks for posting. I gave it a try, and it still fails. It appears that there must be more conflict between the static factory code and each running algorithm. This problem will be fixed with our new redesign of the factored algorithms, but since there is a not quick fix, I don’t think we’ll get a chance to fix this before that comes out. I apologize for that – we haven’t devoted many resources to parallel implementations at the moment, so this is definitely being reflected in these problems.

There might be some work arounds I can suggest if you want, but otherwise, this will have to wait until later. Sorry!

Brian

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Monday, March 30, 2015 10:20 AM To: p2t2/figaro Cc: Brian Ruttenberg Subject: Re: [figaro] Running figaro in parallel on single jvm fails (#413)

You might have to run it a few times to trigger exceptions You might see these cases:

Just play with the code, enabling and disabling some of the commented things, playing with the iterations, setting observations (I'm testing on a macbook pro, scala 2.11.6)

Specs2 test framework (runs both tests in parallel unless sequential is uncommented)

class FigaroSpec extends Specification{

//sequential

"figaro" should {

val runs = 100

"do VariableElimination correctly in multithreaded context" in {

  //skipped

  import scala.concurrent.ExecutionContext.Implicits.global

  val futures = (1 to runs).map { n =>

    //println("ve" + n)

    Future {

      val universe = new Universe

      val select = Select(0.3 -> "foo", 0.7 -> "bar")("node1", universe)

      val target = universe.getElementByReference[String]("node1")

      val alg = VariableElimination(target)(universe)

      alg.start()

      val result = alg.probability(target, "foo")

      alg.kill()

      result

    }

  }

  val future = Future.sequence(futures)

  //println("ve waiting for futures")

  val result = Await.result(future, 10.seconds)

  result forall(v => v must beCloseTo(0.3 +/- 0.001))

}

"do BeliefPropagation correctly in multithreaded context with observation" in {

  //skipped

  import scala.concurrent.ExecutionContext.Implicits.global

  val futures = (1 to runs).map { n =>

    //println("bp" + n)

    Future {

      val universe = new Universe

      val select = Select(0.3 -> "foo", 0.7 -> "bar")("node1", universe)

      select.observe("foo")

      val target = universe.getElementByReference[String]("node1")

      val alg = /*VariableElimination(target)(universe)*/ BeliefPropagation(target)(universe)

      alg.start()

      val result = alg.probability(target, "foo")

      alg.kill()

      result

    }

  }

  val future = Future.sequence(futures)

  //println("bp waiting for futures")

  val result = Await.result(future, 10.seconds)

  result forall(v => v must beCloseTo(1.0 +/- 0.001))

}

}

— Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-87697280.

francisdb commented 9 years ago

It's more about concurrency than parallelism. Thanks for looking into this and I'll wait for a release.

francisdb commented 9 years ago

In any case, the fix will be in Figaro 3.1, which we’re hoping to have out no later than mid April

How is that coming along?

apfeffer commented 8 years ago

@mreposa Can you look into this issue and see if it is fixed by SFI?

francisdb commented 8 years ago

What do you mean by fixed by SFI? I just checked 4.0.0.0 and the tests provided above still fail

apfeffer commented 8 years ago

Francis,

It’s possible that running a different algorithm will solve this issue. Or possible not. We’re looking into it to see which is the case.

Avi

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Wednesday, March 30, 2016 6:06 PM To: p2t2/figaro figaro@noreply.github.com Cc: Avi Pfeffer apfeffer@cra.com Subject: Re: [p2t2/figaro] Running figaro in parallel on single jvm fails (#413)

What do you mean by fixed by SFI? I just checked 4.0.0.0 and the tests provided above still fail

— You are receiving this because you commented. Reply to this email directly or view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-203659399

bkhsieh commented 7 years ago

Hi I am currently running into the same problem, and I was wondering whats the progress on this bug.

much appreciated

kevin

francisdb commented 7 years ago

I wonder if this is related to #596

francisdb commented 7 years ago

Anyway the main culprit must be https://github.com/p2t2/figaro/blob/master/Figaro/src/main/scala/com/cra/figaro/language/Universe.scala#L346

This static implicitly available mutable singleton could be picked up anywhere causing havoc

bruttenberg commented 7 years ago

In general, we haven't validated that Figaro is concurrency- or parallel-safe. We have gotten it to work before in those environments, but those cases are more the exception than the rule. The file that Francis mentions above has been the culprit in a lot of cases, but any easy workaround there is the create a universe for each copy of the model you are running, and ensure that all elements created reference a specific universe.

That being said, I can't guarantee that doing that will solve the problem. We'd probably need to take a look at the code to help any further.

We do, however, welcome ideas on how to make Figaro more robust so it can run in concurrent and parallel environments!

francisdb commented 7 years ago

I do create a new universe for each alg and I pass the universe along in most places in the test project that I provided but that does not seem to help. https://github.com/p2t2/figaro/issues/413#issuecomment-87697280

I might take the time to look a bit deeper, get rid of that global singleton and see what breaks.

bruttenberg commented 7 years ago

Ah sorry, I was getting confused by your comments and Kevin's. So you're saying the test program you wrote above still does not pass?

francisdb commented 7 years ago

yep, still fails as mentioned above https://github.com/p2t2/figaro/issues/413#issuecomment-203659399

bruttenberg commented 7 years ago

Ok, I tried your test program with VE from above. As is, the program still crashes. However, I tried it using our new Structured Variable Elimination algorithm and it seems to work (at least I can't get it to crash after multiple tries).

Structured VE is a new type of factored algorithm that decomposes the model and solves different parts separately. Since each part is independently solved, factor creation is done completely independently. I suspect the reason that regular VE fails is that there is still some legacy code in there that shares data structures. I can't say this problem is 100% fixed, but it's at least hopeful. Structured VE is released in Figaro 4.0, but I ran this test on our just about to be released 4.1. please let me know if it fails on 4.0 though.

If you'd like to read more about structured VE, you can see our paper on Arxiv: https://arxiv.org/abs/1606.03298

francisdb commented 7 years ago

That sounds interesting, let me try Structured VE with 4.0 and come back to you.

francisdb commented 7 years ago

On first sight it indeed runs correctly now 👍 , will do some more integration later. Meanwhile I created some more tickets #665 #664 #663 I guess we keep this ticket open for the VariableElimination and BeliefPropagation implementations?

bruttenberg commented 7 years ago

Good, glad to hear that! We can leave it open for now. The intention is to move all existing algorithms into the SFI framework, so hopefully this bug will just disappear altogether.

From: Francis De Brabandere [mailto:notifications@github.com] Sent: Friday, January 6, 2017 9:44 AM To: p2t2/figaro figaro@noreply.github.com Cc: Brian Ruttenberg bruttenberg@cra.com; Assign assign@noreply.github.com Subject: Re: [p2t2/figaro] Running figaro in parallel on single jvm fails (#413)

On first sight it indeed runs correctly now 👍 , will do some more integration later. Meanwhile I created some more tickets #665https://github.com/p2t2/figaro/issues/665 #664https://github.com/p2t2/figaro/issues/664 #663https://github.com/p2t2/figaro/issues/663 I guess we keep this ticket open for the VariableElimination and BeliefPropagation implementations?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-270916894, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJOJW-79ndkNVQ1r7WHhM5pzrrGAN8bks5rPlNCgaJpZM4D1g35.

bkhsieh commented 7 years ago

I still have problems even with StructuredVE. I am running on Spark and I have made sure to create a new universe for each copy of the model. I feel like something is being shared between each concurrent call and as a result observations are not set correctly.

bruttenberg commented 7 years ago

Hi Kevin,

It's really hard for me to say what's wrong without looking at the model. We'd like to help as much as we can, but please keep in mind that concurrency is not officially supported in Figaro. So while on a case-by-case basis we've gotten it to work, I can't guarantee that any algorithm that works on one concurrent implementation will work on another. If you can provide more information on what's wrong, we can still try our best to help.

bkhsieh commented 7 years ago

The following is my model which is the "john calls" example with a slight modification:

import com.cra.figaro.language._
import com.cra.figaro.algorithm.factored._
import com.cra.figaro.library.compound._
import com.cra.figaro.algorithm.structured.algorithm.structured._

object BayesNet{

  case class BayesNetwork(elements:Seq[Double]){

    val burglary = Flip(0.01)
    val earthquake = Flip(0.0001)
    val falseAlarm = Select(
        0.001 -> "trueAlarm",
        0.999 -> "falseAlarm"
      )

    val kidPrank = Select(
        0.08 -> "notPrank",
        0.5 -> "probablyPrank",
        0.42 -> "isPrank"
      )
    val alarm = CPD( burglary,earthquake,falseAlarm,
      (false, false, "falseAlarm") -> Flip(0.001),
      (false, true, "trueAlarm") -> Flip(0.1),
      (true, false, "falseAlarm") -> Flip(0.9),
      (true, true, "trueAlarm") -> Flip(0.99),
      (false, false, "trueAlarm") -> Flip(0.001),
      (false, true, "falseAlarm") -> Flip(0.1),
      (true, false, "trueAlarm") -> Flip(0.9),
      (true, true, "falseAlarm") -> Flip(0.99))

    val johnCalls = CPD(alarm,kidPrank,
      (false, "notPrank") -> Flip(0.01),
      (false, "probablyPrank") -> Flip(0.01),
      (false, "isPrank") -> Flip(0.01),
      (true, "notPrank") -> Flip(0.01),
      (true, "probablyPrank") -> Flip(0.01),
      (true, "isPrank") -> Flip(0.01))
    def decision(input:Double)={
      if(input>.5) true
      else false
    }
    def fAlarmDecision(input:Double)={
      if(input>.5) "trueAlarm"
      else "falseAlarm"
    }
    def kidPrankDecision(input:Double)={
      if(input>.3) "notPrank"
      else if(input>.7) "probablyPrank"
      else "isPrank"
   }
    def probabilityCalc[T](target: Element[T]): List[(Double,T)]={
        this.burglary.observe(decision(elements(0)))
        this.earthquake.observe(decision(elements(1)))
        this.falseAlarm.observe(fAlarmDecision(elements(2)))
        this.kidPrank.observe(kidPrankDecision(elements(3)))
        this.alarm.observe(decision(elements(4)))
      val uni= new Universe
      val alg = new StructuredVE(uni,target)
      alg.start()
      val result = alg.distribution(target)
      alg.kill()
      uni.deregisterAlgorithm(alg)
      result.toList
    }
  }
}

The following is the main function that I used to call the bayes network model:

import scala.collection.parallel._
import util.Random
import BayesNet.BayesNetwork

object SimpleBayesTest{
  def main(args:Array[String]){

    /// 1000 repetitions
    val n = 1000
    println("starting")
    case class Observables(fiveDoubles: Seq[Double])
    // input data

    // 10 Seq of Observables, each Observables is 5 Doubles

    val rowsOfSeqOfObservables:Seq[Seq[Observables]] = (1 to 10).map{
      x=> Seq.fill(n){
        val u: Random = new Random(x)
        Observables(Seq.fill(5)(u.nextDouble))
      }
    }

    //// Does not work if we use par

    //val output = rowsOfSeqOfObservables.map{row =>
    val output = rowsOfSeqOfObservables.par.map{row =>
      // Each iterable is a Sequence of 5 Doubles
      row.map{
        observables =>
        // Each observables is an Observables
        val homeCell = new BayesNetwork(observables.fiveDoubles)
        // A probability distribution is given for each set of Observables
        val out = homeCell.probabilityCalc(homeCell.johnCalls)
        println(out)
        out
      }
    }
    println("run Complete")
  }
} 

As stated in the comment, this works fine when I take out the "par" in the map. However, when I include the "par", I run into issues similar to what Francis observed:

For example:

[error] (run-main-0) java.util.NoSuchElementException: key not found: Flip(1.0E-4)
java.util.NoSuchElementException: key not found: Flip(1.0E-4)
    at scala.collection.MapLike$class.default(MapLike.scala:228)
    at scala.collection.AbstractMap.default(Map.scala:59)
    at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
    at com.cra.figaro.language.Universe.registerUses(Universe.scala:155)
    at com.cra.figaro.language.Universe$$anonfun$activate$3.apply(Universe.scala:181)
    at com.cra.figaro.language.Universe$$anonfun$activate$3.apply(Universe.scala:181)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at com.cra.figaro.language.Universe.activate(Universe.scala:181)
    at com.cra.figaro.language.Element.<init>(Element.scala:480)
    at com.cra.figaro.language.Deterministic.<init>(Deterministic.scala:21)
    at com.cra.figaro.language.Apply.<init>(Apply.scala:23)
    at com.cra.figaro.language.Apply3.<init>(Apply.scala:68)
    at com.cra.figaro.language.Apply$.apply(Apply.scala:154)
    at com.cra.figaro.library.compound.$up$up$.apply(Tuple.scala:33)
    at com.cra.figaro.library.compound.CPD3.<init>(CPD.scala:42)
    at com.cra.figaro.library.compound.CPD$.apply(CPD.scala:89)
    ...

Thanks for any help on this issue if possible. I really appreciate all your efforts on helping with this issue especially when it is not officially supported.

Kevin

bruttenberg commented 7 years ago

Hi Kevin,

Not a problem to look at this, we try to support as much as we can.

In this case, I can see the problem pretty easily. When you create an element, it “lives” so to speak in a Universe. If you don’t explicitly say which Universe, the element is placed in a default universe. So even though you create a new Universe in the case class and pass it to SFI, all of the elements are still created without specifying a Universe. So you should be doing something like this:

case class BayesNetwork(elements:Seq[Double]){

val universe = new Universe

val burglary = Flip(0.01)(“”, universe) // the “” is just the name of the element, but not neededhere

val earthquake = Flip(0.0001)(“”, universe)

    ….

val alg = new StructuredVE(universe,target)

… }

SO you need to add the Universe to all elements you create in the case class (including the Flips you created inside the CPD). Give that a try and see if it helps.

From: bkhsieh [mailto:notifications@github.com] Sent: Monday, January 9, 2017 8:25 PM To: p2t2/figaro figaro@noreply.github.com Cc: Brian Ruttenberg bruttenberg@cra.com; Assign assign@noreply.github.com Subject: Re: [p2t2/figaro] Running figaro in parallel on single jvm fails (#413)

The following is my model which is the "john calls" example with a slight modification:

import com.cra.figaro.language._

import com.cra.figaro.algorithm.factored._

import com.cra.figaro.library.compound._

import com.cra.figaro.algorithm.structured.algorithm.structured._

object BayesNet{

case class BayesNetwork(elements:Seq[Double]){

val burglary = Flip(0.01)

val earthquake = Flip(0.0001)

val falseAlarm = Select(

    0.001 -> "trueAlarm",

    0.999 -> "falseAlarm"

  )

val kidPrank = Select(

    0.08 -> "notPrank",

    0.5 -> "probablyPrank",

    0.42 -> "isPrank"

  )

val alarm = CPD( burglary,earthquake,falseAlarm,

  (false, false, "falseAlarm") -> Flip(0.001),

  (false, true, "trueAlarm") -> Flip(0.1),

  (true, false, "falseAlarm") -> Flip(0.9),

  (true, true, "trueAlarm") -> Flip(0.99),

  (false, false, "trueAlarm") -> Flip(0.001),

  (false, true, "falseAlarm") -> Flip(0.1),

  (true, false, "trueAlarm") -> Flip(0.9),

  (true, true, "falseAlarm") -> Flip(0.99))

val johnCalls = CPD(alarm,kidPrank,

  (false, "notPrank") -> Flip(0.01),

  (false, "probablyPrank") -> Flip(0.01),

  (false, "isPrank") -> Flip(0.01),

  (true, "notPrank") -> Flip(0.01),

  (true, "probablyPrank") -> Flip(0.01),

  (true, "isPrank") -> Flip(0.01))

def decision(input:Double)={

  if(input>.5) true

  else false

}

def fAlarmDecision(input:Double)={

  if(input>.5) "trueAlarm"

  else "falseAlarm"

}

def kidPrankDecision(input:Double)={

  if(input>.3) "notPrank"

  else if(input>.7) "probablyPrank"

  else "isPrank"

}

def probabilityCalc[T](target: Element[T]): List[(Double,T)]={

    this.burglary.observe(decision(elements(0)))

    this.earthquake.observe(decision(elements(1)))

    this.falseAlarm.observe(fAlarmDecision(elements(2)))

    this.kidPrank.observe(kidPrankDecision(elements(3)))

    this.alarm.observe(decision(elements(4)))

  val uni= new Universe

  val alg = new StructuredVE(uni,target)

  alg.start()

  val result = alg.distribution(target)

  alg.kill()

  uni.deregisterAlgorithm(alg)

  result.toList

}

}

}

The following is the main function that I used to call the bayes network model:

import scala.collection.parallel._

import util.Random

import BayesNet.BayesNetwork

object SimpleBayesTest{

def main(args:Array[String]){

/// 1000 repetitions

val n = 1000

println("starting")

case class Observables(fiveDoubles: Seq[Double])

// input data

// 10 Seq of Observables, each Observables is 5 Doubles

val rowsOfSeqOfObservables:Seq[Seq[Observables]] = (1 to 10).map{

  x=> Seq.fill(n){

    val u: Random = new Random(x)

    Observables(Seq.fill(5)(u.nextDouble))

  }

}

//// Does not work if we use par

//val output = rowsOfSeqOfObservables.map{row =>

val output = rowsOfSeqOfObservables.par.map{row =>

  // Each iterable is a Sequence of 5 Doubles

  row.map{

    observables =>

    // Each observables is an Observables

    val homeCell = new BayesNetwork(observables.fiveDoubles)

    // A probability distribution is given for each set of Observables

    val out = homeCell.probabilityCalc(homeCell.johnCalls)

    println(out)

    out

  }

}

println("run Complete")

}

}

As stated in the comment, this works fine when I take out the "par" in the map. However, when I include the "par", I run into issues similar to what Francis observed:

For example:

[error] (run-main-0) java.util.NoSuchElementException: key not found: Flip(1.0E-4)

java.util.NoSuchElementException: key not found: Flip(1.0E-4)

    at scala.collection.MapLike$class.default(MapLike.scala:228)

    at scala.collection.AbstractMap.default(Map.scala:59)

    at scala.collection.mutable.HashMap.apply(HashMap.scala:65)

    at com.cra.figaro.language.Universe.registerUses(Universe.scala:155)

    at com.cra.figaro.language.Universe$$anonfun$activate$3.apply(Universe.scala:181)

    at com.cra.figaro.language.Universe$$anonfun$activate$3.apply(Universe.scala:181)

    at scala.collection.immutable.List.foreach(List.scala:381)

    at com.cra.figaro.language.Universe.activate(Universe.scala:181)

    at com.cra.figaro.language.Element.<init>(Element.scala:480)

    at com.cra.figaro.language.Deterministic.<init>(Deterministic.scala:21)

    at com.cra.figaro.language.Apply.<init>(Apply.scala:23)

    at com.cra.figaro.language.Apply3.<init>(Apply.scala:68)

    at com.cra.figaro.language.Apply$.apply(Apply.scala:154)

    at com.cra.figaro.library.compound.$up$up$.apply(Tuple.scala:33)

    at com.cra.figaro.library.compound.CPD3.<init>(CPD.scala:42)

    at com.cra.figaro.library.compound.CPD$.apply(CPD.scala:89)

    ...

Thanks for any help on this issue if possible. I really appreciate all your efforts on helping with this issue especially when it is not officially supported.

Kevin

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/413#issuecomment-271459863, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJOJVQpJEwnlS9_T2VvzpwXFSBJKyG2ks5rQt3YgaJpZM4D1g35.