Closed kloudkl closed 10 years ago
Decaf has already used MPI in a few places.
Just a precaution type note: I used mpi in my earlier projects that never got open-sourced (parallel linear models over a reasonably sized cluster, see e.g. my ICCV 2013 task adaptation paper). I don't recall completely making mpi runnable under either decaf and caffe, though...
Yangqing
On Thu, Feb 6, 2014 at 9:30 PM, kloudkl notifications@github.com wrote:
Decaf has already used MPI in a few placeshttps://github.com/UCB-ICSI-Vision-Group/decaf-release/search?q=mpi&ref=cmdform .
Reply to this email directly or view it on GitHubhttps://github.com/BVLC/caffe/issues/65#issuecomment-34405438 .
The first open source large scale machine learning projects that I encountered were Vowpal Wabbit[1] and Edward Y. Chang's PSVM, PLDA, Parallel Spectral Clustering which all used MPI but none were based on CUDA. Neither did they train deep nonlinear models. But the achievements of the industry such as Baidu IDL should motivate the academy towards a comparable large scale distributed training framework. A progressive roadmap may be to implement a version on CPU at first and add GPU capability after the initial success.
[1] Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford, A Reliable Effective Terascale Linear Learning System, 2011.
I am interested in working on this. Is there still work ongoing?
I am thinking something like [1], which uses MPI + cuda-convnet. There is also an interesting write-up by Netflix [2] where they use distributed computing for hyperparameter tuning.
[2] http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html
@Yangqing, would you please recover the related commits?
for commit in 64e28ba 591c36b a3eb62a a48147c; do git cherry-pick $commit; done
Microsoft Project Adam sounds very promising [1].
[1] Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, Karthik Kalyanaraman, "Project Adam: Building an Efficient and Scalable Deep Learning Training System" To appear in the 11th USENIX Symposium on Operating Systems Design and Implementation '14 (OSDI), Oct. 2014.
Unfortunately, the paper won't be public until the conference is held in October. Did anyone register the ODSI 2014 and have access to the paper?
The answer is only in the paper.
@kloudkl Some other preliminar info on Project Adam.
The paper became public this Monday.
This is in-progress through #1148 so this place-holder issue is no longer needed.
@Yangqing started the work to implement the distributed solver in a series of commits 64e28ba, 591c36b, a3eb62a, a48147c, 3385a14, 7c6835d, 04f5224. In the area of high performance computing, MPI is commonly used for inter-node communication and has been integrated with deep learning algorithm[1]. Last year, the executive vice president of Baidu Institue of Deep Learning Kai Yu announced PADDLE, their GPU counterpart to Google's DistBelief. Therefore, we should continue the development to enable large scale training such as on the complete ImageNet dataset rather than the smaller one for the challenge.
The commits to revive the efforts are 206dc98 and c204fa9. I suggest one of BVLC members to checkout a feature branch devoted to this issue because it would probably involve a long time of implementation, debugging, testing, performance benchmarking and even some research work.
[1] Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng and Bryan Catanzaro. Deep Learning with COTS HPC. In ICML 2013. [2] Large-scale Deep Learning at Baidu