charles-river-analytics / figaro

Figaro Programming Language and Core Libraries
Other
756 stars 151 forks source link

Performance in Filtering Algorithms #633

Closed ssbanerje closed 7 years ago

ssbanerje commented 7 years ago

I am trying to figure out how to efficiently implement a factoring algorithm in Figaro. Please have a look at the following code:

import com.cra.figaro.language._
import com.cra.figaro.library.collection._
import com.cra.figaro.algorithm.filtering._

object test {

  lazy val init = () => {
    val u = Universe.createNew()
    val a = new FixedSizeArray[Int](100, (i) => Constant[Int](0)("val"+i, u))
    compute_sum(a, u)
    u
  }

  lazy val transition = (u: Universe) => {
    val next = Universe.createNew()
    val a = new FixedSizeArray[Int](100, (i) => Apply(u.getElementByReference[Int]("val"+i), Flip(0.5), (v: Int, f: Boolean) => {
      if (f) v + 1 else v - 1
    })("val"+i, next))
    compute_sum(a, next)
    next
  }

  def compute_sum(a: FixedSizeArray[Int], u:Universe) = {
    // CASE 1
    a.map{(i) => i>0}.count(_)
    // CASE 2
    a.map{(i) => i>0}.count{(b) => b}("ANSWER1", u)
  }

  def main(args: Array[String]): Unit = {
    val inf = FactoredFrontier(init(), transition, 50)
    inf.start

    for (i <- 1 to 1000) {
      inf.advanceTime()
      println(i)
      println(inf.computeCurrentExpectation[Int]("ANSWER1", (i:Int) => i))
    }

    inf.kill
  }
}

When I use CASE 1 in compute_sum, the program finishes in a few seconds. However when I use CASE 2 it takes much much more time (~30 mins: 100x??). I dont quite understand why these two are different in performance.

bruttenberg commented 7 years ago

Hi Subho,

This is an interesting one. It seems that in CASE 1, scala is doing something odd. When I bring this up in the debugger, the type returned by compute_sum in CASE 1 is a ((Boolean) => (Boolean)) => Element[Int]. I’m not sure what this is the case – I think it should just be an Element[Int], so not sure why it’s returning a function. In CASE 2, it does correctly return an Element[Int] (a Fold[Int] class, which is the same). To summarize, CASE 1 is doing nothing, but not sure why.

Now, for CASE 2, you’re doing the summation of a bunch of random values. The reason it’s slow is that this will probably result in some very large factors since there are many output combinations with 100 random numbers. However, since you’re not operating on the sum (just using it for output), it should still be feasible to run (albeit still slow, especially as time increases).

Brian

From: Subho Banerjee [mailto:notifications@github.com] Sent: Wednesday, November 30, 2016 7:01 PM To: p2t2/figaro figaro@noreply.github.com Subject: [p2t2/figaro] Performance in Filtering Algorithms (#633)

I am trying to figure out how to efficiently implement a factoring algorithm in Figaro. Please have a look at the following code:

import com.cra.figaro.language._

import com.cra.figaro.library.collection._

import com.cra.figaro.algorithm.filtering._

object test {

lazy val init = () => {

val u = Universe.createNew()

val a = new FixedSizeArray[Int](100, (i) => Constant[Int](0)("val"+i, u))

compute_sum(a, u)

u

}

lazy val transition = (u: Universe) => {

val next = Universe.createNew()

val a = new FixedSizeArray[Int](100, (i) => Apply(u.getElementByReference[Int]("val"+i), Flip(0.5), (v: Int, f: Boolean) => {

  if (f) v + 1 else v - 1

})("val"+i, next))

compute_sum(a, next)

next

}

def compute_sum(a: FixedSizeArray[Int], u:Universe) = {

// CASE 1

a.map{(i) => i>0}.count(_)

// CASE 2

a.map{(i) => i>0}.count{(b) => b}("ANSWER1", u)

}

def main(args: Array[String]): Unit = {

val inf = FactoredFrontier(init(), transition, 50)

inf.start

for (i <- 1 to 1000) {

  inf.advanceTime()

  println(i)

  println(inf.computeCurrentExpectation[Int]("ANSWER1", (i:Int) => i))

}

inf.kill

}

}

When I use CASE 1 in compute_sum, the program finishes in a few seconds. However when I use CASE 2 it takes much much more time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/633, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJOJYFovq3B-rTNzcCASnQ6hiHxM8pJks5rDg5RgaJpZM4LA195.

ssbanerje commented 7 years ago

Thanks @bruttenberg . That definitely helps...

Are there any "best-practices" guides available for getting the best performance out of a model. I am trying to model a system which computes factored frontier over 20k-30k variables (~100k size state space). Figaro doesn't seem to scale to this size of a model.

As far as I can gather from the web, Figaro has the ability to recognize repeated (and independent) sub-models and can optimize the overall computation by memoizing the results from these sub-models. How do I ensure these optimizations are always successfully applied?

bruttenberg commented 7 years ago

Hi Subho,

We don’t have an explicit best practices guide, though that is actually a great idea. A lot of the best practices with all probabilistic programming languages and probabilistic models apply here, but I admit, those best practices are probably not written down anywhere. A lot of time it just sits in people’s heads from experience.

For your specific case, yes, this is quite a large model. The number of variables isn’t so much the problem as the state space, which will cause problems when you’re trying to multiply/sum factors. In general, doing things like abstracting states or compacting the state space will help. We do also know some tricks that can be done based on what we know about the underlying factors (e.g., some elements create sparse factors). But I don’t have any specific recommendations without knowing more about the problem.

As for your comment on memoization. Yes, we have some experimental algorithms than might be able to help in this case. It’s called structured factored inference (SFI), and it decomposes a model into smaller problems that are solved independently and rolled up. In general, we see improvements using SFI over non-SFI inference (without memoization). Memoization might help here, but keep in mind this is not true lifted inference (if you’re familiar with that), so its utility is really dependent on how the model is coded. We don’t have SFI working with Factored Frontier yet, but that is a pretty simple extension that can be done.

From: Subho Banerjee [mailto:notifications@github.com] Sent: Thursday, December 1, 2016 10:55 AM To: p2t2/figaro figaro@noreply.github.com Cc: Brian Ruttenberg bruttenberg@cra.com; Comment comment@noreply.github.com Subject: Re: [p2t2/figaro] Performance in Filtering Algorithms (#633)

Thanks Brian. That definitely helps...

Are there any "best-practices" guides available for getting the best performance out of a model. I am trying to model a system which computes factored frontier over 20k-30k variables (~100k size state space). Figaro doesn't seem to scale to this size of a model.

As far as I can gather from the web, Figaro has the ability to recognize repeated (and independent) sub-models and can optimize the overall computation to memoize the results from these sub-models. How do I ensure these optimizations are always successfully applied?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/p2t2/figaro/issues/633#issuecomment-264210947, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFJOJWcRIQeAP7z_VwaRbYjzaY9RjPofks5rDu3MgaJpZM4LA195.