Profile both undirected and directed models for possible speed enhancements

GoogleCodeExporter commented 8 years ago

It has been a while since this was done, and I know there is some fruitful 
tweaking to be done.

Original issue reported on code.google.com by andrew.k.mccallum on 19 Nov 2009 at 9:20

GoogleCodeExporter commented 8 years ago

I've started working on this, and hopefully will find something that gives us a
boost. If its something major, I'll run it by you.

Original comment by sameeersingh on 21 Nov 2009 at 1:01

GoogleCodeExporter commented 8 years ago

Currently looking at WordSegDemo

Around 50% of the time is spent taking the dot product of the weights with the
statistics. This is because the scalala implementation is optimized for fast
enumeration of the values, storing everything as arrays. Thus when taking the 
dot
product, it is linear in the non-zero values of the SparseVector. Thus if the 
weights
have m values, and the statistics have n, time taken is O(m).

To test this theory quickly, I used an implementation which went through all 
values
of the statistics vector, and binary searched for the corresponding value in the
weights vector. This would be an O(n log m) solution, which ended up being 
faster by
about 20%.

Finally I tried the hash-based implementation instead, which required more 
changes to
the scalala code, and introduction of SparseHashWeights in factorie. This 
corresponds
to a ~O(n) algorithm. The final running time dropped by 40%, resulting in only 
~10%
of the time being spent in dot products.

I've emailed a request to check in my changes in scalala. I've also checked in 
the
SparseHashWeights, although without scalala changes it will not give us much of 
a
benefit yet. This analysis is only based on the WordSeg example, and the 
improvement
may not generalize to all use cases, however, the fact still remains that 
scalala, as
being used, is not optimized for dot products.

I'll continue to study WordSeg a little while more before moving to a different 
use case.

Original comment by sameeersingh on 23 Nov 2009 at 7:18

GoogleCodeExporter commented 8 years ago

Original comment by sameeersingh on 23 Nov 2009 at 7:24

Changed state: Started
Added labels: Performance

GoogleCodeExporter commented 8 years ago

Does the amount of time performing vector operations warrant the overhead of
parallelization with the GPU? Are the scalala folks still considering 
incorporating
the scala GPU toolkit? should we nudge them in this direction?

Original comment by thebiasedestimator@gmail.com on 23 Nov 2009 at 3:38

GoogleCodeExporter commented 8 years ago

I saw some activity on the scalala list over summer regarding GPUs, but have not
heard about it since.

For now the dot vector takes only ~10% of the time for WordSeg. This maybe 
worse for
other examples, which use conjunctions with larger domains, however with
SparseHashWeights the dot product becomes almost independent of the size of the
weight vector. I don't know if using GPU can prune this much further.

I am still in the process of looking at where the time is spent now, but I 
imagine it
is in unrolling the factors and computing the statistics, which cannot be
parallelized easily using GPUs.

Original comment by sameeersingh on 23 Nov 2009 at 5:42

GoogleCodeExporter commented 8 years ago

I've seen some boxing and unboxing for Vector(i) += x. This might be another 
place to look. 

Also note that we are using our own version of scalala from a UMass maven 
repository. You could ask 
Sebastian about getting your patch into that. 

I wonder if we might see furher efficiencies if we update to the latest version.

Original comment by andrew.k.mccallum on 23 Nov 2009 at 7:10

GoogleCodeExporter commented 8 years ago

Original comment by sameeersingh on 21 Feb 2012 at 6:35

Changed state: Fixed
Added labels: Priority-Low
Removed labels: Priority-Medium

goete111 / factorie

Profile both undirected and directed models for possible speed enhancements #4