Closed kloudkl closed 10 years ago
The codes do not have to be changed. Thanks to Michael Rutter, multi-threaded OpenBLAS package is available on all the versions of Ubuntu since Precise (12.04). Multi-threaded OpenBLAS backported to recent Ubuntu releases Follow the steps to take advantage of the powerful acceleration:
sudo add-apt-repository ppa:marutter/rdev
sudo apt-get update
sudo apt-get install libopenblas-base
Benchmark results are demonstrated in the related issue: #16
ComputeUpdatedValue() is not a big issue when doing large networks. It is the ForwardBackward() function, and the individual layers that takes the most time. Thus, parallellizing it will not give us much gain.
Within the ForwardBackward() computation, the convolutional layers are the ones which take most of the time (see #83) therefore parallelizing the loops there will be the most effective
In each iteration of Solver::Solve, there are four chances to accelerate the computation. The first opportunity is the most complex one since Net::ForwardBackward invokes the Forward and Backward of all the layers that comprise a net.
The second chance is more straightforward. An OpenMP directive is enough to parallelize the independent computation for each param_id.
The only extra trick that is needed to deal with the next occasion is to distinguish CPU and GPU mode.
The last one involves a plain old OpenMP friendly nested for loop.