Parallelize result computation

cocreature commented 7 years ago

Currently we only parallelize the aggregration phase but the result computation, in particular the calculation of E is not parallelized. For small precisions this is probably not worth it but for e.g. 16 we are summing calculating std::pow and summing the results 2^16 times so it probably makes sense to parallelize this.

I’m not quite sure what the best way to accomplish this is. Do we need two nodes for this?

TiFu commented 7 years ago

I am a bit confused about what's going on when you call HyperLogLog().

The call to DIA::HyperLogLog() creates a HyperLogLogNode.
The HyperLogLogNode adds itself to the execution graph (as a subnode of HyperLogLog in the execution graph).
Then the HyperLogLogNode is executed and returns the reduced registers as a result.
After this step, everything is done only on one worker and only one worker has the reducedRegisters.

Is my understanding correct?

In that case what we need to do to parallelize the result calculation is

Scatter the entries of the reducedRegisters
Calculate 2^-entry on each node
AllReduce
- returning a Par<double, unsigned int>
- double is the result value (variable E)
- unsigned int is V

In thrill terms:

Create DIA
Map: registerEntry => pair<double, unsigned int>
AllReduce

cocreature commented 7 years ago

I don’t think your explanation is correct. After the call to context_.net.AllReduce each worker has the same result and they all calculate the result. This is the reason for the results being printed once for each worker.

The steps for parallelizing this look correct. The operation for scattering the vector is called Distribute in Thrill. It probably also makes sense to make this optional since I would expect the overhead to be too large for small precisions.

TiFu commented 7 years ago

Implemented in 54f0cd5256

We still need to figure out at which size we want to enable the parallelization.

cocreature / thrill

Parallelize result computation #7