Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
mxnet uses ps-lite as its parameter server in distributed environment. Currently ps-lite only supports integer keys. There are people asking support for string key, but got no response yet.
mxnet has its own way to solve this. I have noticed that there are two dicts to do the convert between string key and int key in kvstore_local.h. But every worker starts with index zero when converting string to int. KVStoreDist inherts from KVStoreLocal, it uses the same way to handle string keys.
So is there any possibility that different worker will assign differnt string key to the same int and cause an error to parameters?
If we do data parallelism, every worker has the same logic, it seems ok. But mxnet supports dynamic graph computation, each training sample may has different computation graphs. So the string key pushed to server may have different sequence.
The current str key support in dist-kvstore is very limited:
it assumes all workers have the same key set
it assumes the keys are initialized in the same order
To truly support str-key with dist-kvstore, we also need to record a int key->str key mapping on the server side, which is not implemented for now. One way to implement this is just use rank0 server as a coordinator, which serializes all the key requests when keys are initialized. The workers have to send an extra message to the coordinator to get the int key during str key initialization. You're welcome to try to fix it and post a PR for this.
mxnet uses ps-lite as its parameter server in distributed environment. Currently ps-lite only supports integer keys. There are people asking support for string key, but got no response yet.
mxnet has its own way to solve this. I have noticed that there are two dicts to do the convert between string key and int key in
kvstore_local.h
. But every worker starts with index zero when converting string to int.KVStoreDist
inherts fromKVStoreLocal
, it uses the same way to handle string keys.So is there any possibility that different worker will assign differnt string key to the same int and cause an error to parameters?
If we do data parallelism, every worker has the same logic, it seems ok. But mxnet supports dynamic graph computation, each training sample may has different computation graphs. So the string key pushed to server may have different sequence.