BP inference is slow: Profile and fix

AnantLabs / factorie

Automatically exported from code.google.com/p/factorie

0 stars 0 forks source link

BP inference is slow: Profile and fix #21

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Anecdotal evidence of running time puts it as at least twice, but perhaps 
more than 5 times slower than the old BP implementation
2. >10 times slower than a customized beam-search approach for chains

What is the expected output? What do you see instead?
Although comparison with customized beam-search is unfair, we should narrow the 
gap to the old BP code. Further, we should compare against the Mallet 
implementation to achieve similar performance.

Original issue reported on code.google.com by sameeersingh on 25 Feb 2012 at 11:21

GoogleCodeExporter commented 9 years ago

The first step is to write a "test" that runs multiple implementations (new BP, 
old BP and beam search) on the same set of models multiple times to get a 
running time estimate. We can use that test to report better numbers as the BP 
implementation is improved.

Original comment by sameeersingh on 25 Feb 2012 at 11:23

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by sameeersingh on 25 Feb 2012 at 11:24

Changed state: New
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I'm writing a small such test as we speak.

Original comment by alexandr...@gmail.com on 25 Feb 2012 at 11:29

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Commited. It's in examples/TimingBP. Has lots of parameters to tweak to put us 
in a reasonable state. Right now it crashed BP sometimes. Any idea why?

Original comment by alexandr...@gmail.com on 26 Feb 2012 at 12:05

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I've pushed a change that splits SumFactor.marginalize into subfunctions to 
aind profiling. The most expensive subfunctions are sumNeighbors and computeZ

Some things which I've tested and don't help:
  1. optimizing away the map()s in Factor2.valuesIterator
  2. caching the results of Factor2.valuesIterator in SumFactor

One oddity which I can neither explain nor do nothing about: in the profiler 
and cpu sampler calls to SumFactor.varyingNeighbors show up with a high 
percentage of "self time". This makes no sense, however, as there is no such 
method, only an object-specific val (which is implemented by the scala compiler 
as a method that returns the proper field). Even the initialization code for 
varyingNeighbors looks cheap to me. If I make sure to cache varyingNeighbors as 
a local variable of marginalize or its subfunctions it still makes no 
difference.

Original comment by alexandr...@gmail.com on 26 Feb 2012 at 3:29

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

The checking that makes it 15% faster does so by avoiding going through 
valuesIterator a second time for the sole purpose of getting the index of an 
assignment. We can bypass that by using a bp-specific index-computation 
function that orders the variables by their order in discreteVarying, which 
allows us to inspect an assignmnent's index to extract the values of the 
variables. MaxFactor is also faster because of this.

Original comment by alexandr...@gmail.com on 27 Feb 2012 at 4:12

Added labels: ****
Removed labels: ****