apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Distributed training use mxnet #8224

Open chunyang-wen opened 7 years ago

chunyang-wen commented 7 years ago

mxnet uses ps-lite as its parameter server in distributed environment. Currently ps-lite only supports integer keys. There are people asking support for string key, but got no response yet.

mxnet has its own way to solve this. I have noticed that there are two dicts to do the convert between string key and int key in kvstore_local.h. But every worker starts with index zero when converting string to int. KVStoreDist inherts from KVStoreLocal, it uses the same way to handle string keys.

So is there any possibility that different worker will assign differnt string key to the same int and cause an error to parameters?

If we do data parallelism, every worker has the same logic, it seems ok. But mxnet supports dynamic graph computation, each training sample may has different computation graphs. So the string key pushed to server may have different sequence.

eric-haibin-lin commented 7 years ago

The current str key support in dist-kvstore is very limited: