BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.04k stars 18.7k forks source link

Support multithreading in the CPU mode of Solver::Solve #79

Closed kloudkl closed 10 years ago

kloudkl commented 10 years ago

In each iteration of Solver::Solve, there are four chances to accelerate the computation. The first opportunity is the most complex one since Net::ForwardBackward invokes the Forward and Backward of all the layers that comprise a net.

      Dtype loss = net_->ForwardBackward(bottom_vec);

The second chance is more straightforward. An OpenMP directive is enough to parallelize the independent computation for each param_id.

      ComputeUpdateValue();

The only extra trick that is needed to deal with the next occasion is to distinguish CPU and GPU mode.

      net_->Update();

The last one involves a plain old OpenMP friendly nested for loop.

      Test();
kloudkl commented 10 years ago

The codes do not have to be changed. Thanks to Michael Rutter, multi-threaded OpenBLAS package is available on all the versions of Ubuntu since Precise (12.04). Multi-threaded OpenBLAS backported to recent Ubuntu releases Follow the steps to take advantage of the powerful acceleration:

sudo add-apt-repository ppa:marutter/rdev
sudo apt-get update
sudo apt-get install libopenblas-base

Benchmark results are demonstrated in the related issue: #16

Yangqing commented 10 years ago

ComputeUpdatedValue() is not a big issue when doing large networks. It is the ForwardBackward() function, and the individual layers that takes the most time. Thus, parallellizing it will not give us much gain.

sguada commented 10 years ago

Within the ForwardBackward() computation, the convolutional layers are the ones which take most of the time (see #83) therefore parallelizing the loops there will be the most effective