dalab / dissolve-struct

Distributed solver library for large-scale structured output prediction, based on Spark. Project website:
http://dalab.github.io/dissolve-struct/
Apache License 2.0
17 stars 5 forks source link

avoid dense vector operations in local solver #29

Open martinjaggi opened 8 years ago

martinjaggi commented 8 years ago

for good performance on sparse, we shouldn't use any dense vector operation in a single coordinate update. such an update needs to have cost only (number of non-zeros of the returned feature vector by the oracle), which can be very sparse.

e.g. we should remove the dense temp variables here for example, https://github.com/dalab/dissolve-struct/blob/master/dissolve-struct-lib/src/main/scala/ch/ethz/dalab/dissolve/optimization/DBCFWSolverTuned.scala#L961

we can check how that affects performance on a large sparse binarySVM dataset.

jderiu commented 8 years ago

I think the problem is not the operation performance itself. I think it's rather the fact that a + b returns a new vector. Then the old weight vector object is replaced by the new one, which over time causes the garbage collector to be called.

I replaced the updateWeights in the StructSVMModel class by a method which adds a vector to the weight without creating a new vector-object. It seems to execute faster.

Before i noticed that the localModel.updateWeights(tempWeights2) operation took 16ms every 5 or 6 iterations, which indicated that there was garbage collection involved.

When using the new methods i created (one for addition and one for subtraction) this part of the code takes 0ms.

I would like to test my methods some more, if they seem ok. I could to a pull request sometime tomorrow.

martinjaggi commented 8 years ago

thanks!

can you check how it affects the binarySVM example performance? the RCV1 dataset would be recommended for the check, since that's very sparse.

On Thu, Nov 12, 2015 at 4:43 PM, jderiu notifications@github.com wrote:

I think the problem is not the operation performance itself. I think it's rather the fact that a + b returns a new vector. Then the old weight vector object is replaced by the new one, which over time causes the garbage collector to be called.

I replaced the updateWeights in the StructSVMModel class by a method which adds a vector to the weight without creating a new vector-object. It seems to execute faster.

Before i noticed that the localModel.updateWeights(tempWeights2) operation took 16ms every 5 or 6 iterations, which indicated that there was garbage collection involved.

When using the new methods i created (one for addition and one for subtraction) this part of the code takes 0ms.

— Reply to this email directly or view it on GitHub https://github.com/dalab/dissolve-struct/issues/29#issuecomment-156143485 .

jderiu commented 8 years ago

I checked the performance. I ran it on the RCV1 dataset. The old implementation takes approx. 1200ms per round. The new implementation takes only 750ms per round.

What is striking however is that each iteration in the mapper takes longer as the previous. So that the first iteration takes 1ms and the last iteration (after passing through all 16k datapoints) takes 500ms. I think there is the same kind of problem since each vector addition and subtraction creates new objects which have to be garbage collected. In these cases however I don't quite know how to adapt the code yet.. Maybe it would make sense to optimize the code to avoid the creation of new objects

martinjaggi commented 8 years ago

yes one has to be careful what kind of adding operation will actually result, when adding a sparse vector (the feature vector) to a dense (w). in your experiments did you keep w dense? (and w_i sparse?)

jderiu commented 8 years ago

yes w is a dense vector and w_i is sparse.

the increase of duration in each iteration was due to a small bug I introduced.. So after rerunning the experiments I get that the old implementation, which replaced the weight vector (which is dense) by a new one, takes 1200ms per round. If one takes a look at the mapper iterations one can notice that most iterations take 0ms, the others take about 15ms.

The new implementation which add and subtracts from the weight vector without creating a new object takes 400ms per round. Here one can notice that all iterations in the mapper take 0ms.

jderiu commented 8 years ago

I also ran the it on my data. Which has 9600 datapoins and feature vectors with 220k dimensions. The old implementation takes 22000ms per round the new one 1200ms. If you want I can do a pull request.

martinjaggi commented 8 years ago

yes, please make a PR