Revive the distributed solver efforts

kloudkl commented 10 years ago

@Yangqing started the work to implement the distributed solver in a series of commits 64e28ba, 591c36b, a3eb62a, a48147c, 3385a14, 7c6835d, 04f5224. In the area of high performance computing, MPI is commonly used for inter-node communication and has been integrated with deep learning algorithm[1]. Last year, the executive vice president of Baidu Institue of Deep Learning Kai Yu announced PADDLE, their GPU counterpart to Google's DistBelief. Therefore, we should continue the development to enable large scale training such as on the complete ImageNet dataset rather than the smaller one for the challenge.

The commits to revive the efforts are 206dc98 and c204fa9. I suggest one of BVLC members to checkout a feature branch devoted to this issue because it would probably involve a long time of implementation, debugging, testing, performance benchmarking and even some research work.

[1] Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng and Bryan Catanzaro. Deep Learning with COTS HPC. In ICML 2013. [2] Large-scale Deep Learning at Baidu

kloudkl commented 10 years ago

Decaf has already used MPI in a few places.

Yangqing commented 10 years ago

Just a precaution type note: I used mpi in my earlier projects that never got open-sourced (parallel linear models over a reasonably sized cluster, see e.g. my ICCV 2013 task adaptation paper). I don't recall completely making mpi runnable under either decaf and caffe, though...

Yangqing

On Thu, Feb 6, 2014 at 9:30 PM, kloudkl notifications@github.com wrote:

Decaf has already used MPI in a few placeshttps://github.com/UCB-ICSI-Vision-Group/decaf-release/search?q=mpi&ref=cmdform .

Reply to this email directly or view it on GitHubhttps://github.com/BVLC/caffe/issues/65#issuecomment-34405438 .

kloudkl commented 10 years ago

The first open source large scale machine learning projects that I encountered were Vowpal Wabbit[1] and Edward Y. Chang's PSVM, PLDA, Parallel Spectral Clustering which all used MPI but none were based on CUDA. Neither did they train deep nonlinear models. But the achievements of the industry such as Baidu IDL should motivate the academy towards a comparable large scale distributed training framework. A progressive roadmap may be to implement a version on CPU at first and add GPU capability after the initial success.

[1] Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford, A Reliable Effective Terascale Linear Learning System, 2011.

jamt9000 commented 10 years ago

I am interested in working on this. Is there still work ongoing?

I am thinking something like [1], which uses MPI + cuda-convnet. There is also an interesting write-up by Netflix [2] where they use distributed computing for hyperparameter tuning.

[1] Paine, T., Jin, H., Yang, J., Lin, Z., & Huang, T. (2013). GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training. arXiv Preprint arXiv:1312.6186

[2] http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html

kloudkl commented 10 years ago

@Yangqing, would you please recover the related commits?

for commit in 64e28ba 591c36b a3eb62a a48147c; do git cherry-pick $commit; done

kloudkl commented 10 years ago

Microsoft Project Adam sounds very promising [1].

[1] Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, Karthik Kalyanaraman, "Project Adam: Building an Efficient and Scalable Deep Learning Training System" To appear in the 11th USENIX Symposium on Operating Systems Design and Implementation '14 (OSDI), Oct. 2014.

Unfortunately, the paper won't be public until the conference is held in October. Did anyone register the ODSI 2014 and have access to the paper?

bhack commented 10 years ago

Is it based on Hogwild ?

kloudkl commented 10 years ago

The answer is only in the paper.

bhack commented 10 years ago

@kloudkl Some other preliminar info on Project Adam.

futurely commented 10 years ago

The paper became public this Monday.

shelhamer commented 10 years ago

This is in-progress through #1148 so this place-holder issue is no longer needed.

BVLC / caffe

Revive the distributed solver efforts #65