clab / dynet

DyNet: The Dynamic Neural Network Toolkit
Apache License 2.0
3.42k stars 707 forks source link

Any plan or roadmap to support distributed version of dynet? #921

Open chunyang-wen opened 6 years ago

chunyang-wen commented 6 years ago

Data is increasing dramatically. Distributed training is a trend. I wonder if there is any plan to support this.

xunzhang commented 6 years ago

Hi, we are actually working on a distributed version recently. But we don't want to make it public before it is stable to use, so maybe it still needs some time to reach there. BTW, could I know your detailed scene(dataset size, models, and the training time) so that we might prioritize its support in advance. Thanks.

xunzhang commented 6 years ago

Another work related to this is the single node with multiple GPUs, it shouldn't be too far away. I think it might appear in the next several coming releases.

chunyang-wen commented 6 years ago

Thanks for your quick response. The speed of dynamically constructing graph is really handy. Currently we are trying to train a model with billions of instances. Each instance has a timestamp, so we can not just simply do data parallelism. There are ways to divide data into irrevelant batches. But instances share common weight. As distributed training is not supported, training speed is kind of slow. For about 20 million instances, batch size=10000, it takes at least 4 hours. model size is related to some unique id in instance. The model is just a simple logistic regression model variation.

Any upcoming timeline about distributed training support? I have noticed your answer on zhihu. For your reference

xunzhang commented 6 years ago

Good to know your case. But currently, we don't have a specific plan to release that. I will leave comments under this thread if we have updates. BTW, which zhihu post are you referring to?

chunyang-wen commented 6 years ago

I am making a little progress then. I decide to use ps-lite, a distributed KV store. I am adding it as a submodule of dynet and now it compiles.

When to update values (PUSH to or PULL from server) ?

It's better in trainer's update function.

A new function will be added to trainer to mark the end of training and pull latest data. There are two things that I need to solve:

redpony commented 6 years ago

One request: if possible, make this configurable. There are a number of algorithms for distributed training (distributed synchronous SGD, distributed async SGD aka HogWild, etc.), and there are a number of transport layers that could be used here (MPI, custom shared memory things on a single machine, zillions of variants of parameter servers). Let's try to design so we can be forward compatible with variations that are likely to be tried.

dnbaker commented 5 years ago

I would be interested in this as well. For example, Caffe-MPI demonstrates nearly perfect scaling in this paper, significantly better than the frameworks (tensorflow, mxnet, cntk), but dynet seems better suited to my interests.

bskaggs commented 5 years ago

@xunzhang Any chance there have been updates on your distributed version?

I was looking at https://github.com/horovod/horovod/ and it possibly has potential.