Open amithr1 opened 8 years ago
The EASGD?
@amithr1 Do you mean Elastic Averaging SGD (EASGD)? We can implement that using the current KVStore API and low-level executor APIs. @peterzcc and I have done this but it's not ready for a PR.
@sxjscience I did it too, but the solution is not nice. Can you PR one later?
@Godricly We have written a separate server optimizer for EASGD (https://github.com/sxjscience/mxnet/blob/master/python/mxnet/optimizer.py#L849-L867). The local workers will update the weights using the local optimizer and communicate with the server using pull
, push
primitives, which will be handled by the server optimizer.
According to our experiments, EASGD is more stable than DownpourSGD (We can communicate in larger steps). However, adding EASGD support to high-level API requires additional works (performance comparisons + compatibility checks). We haven't found time to do this.
wow...thats much better than what i did in messing up model.py with kvstore. So each local work is a node in EASGD?
@Godricly Yes. The workflow looks like this.
server_optimizer = mx.optimizer.create(name="ServerEASGD", learning_rate=easgd_alpha)
kv.set_optimizer(server_optimizer)
local_updater = mx.optimizer.get_updater(local_optimizer)
while NOT_CONVERGE:
net.forward(is_train=True, data=data)
net.backward(label=label)
if total_steps % kvstore_update_period == 0:
for ind, k in enumerate(net.params.keys()):
kv.pull(ind, central_weight[k], priority=-ind)
net.params[k][:] -= easgd_alpha *(net.params[k] - central_weight[k])
kv.push(ind, net.params[k], priority=-ind)
net.update(updater=local_updater)
Cool. Is this one based on module interface?
It's mainly based on the low-level executor. I've written a simple wrapper for that, which is similar to the module interface.
@sxjscience Shouldn't you do: for ind, k in enumerate(net.params.keys()): kv.pull(ind, central_weight[k], priority=-ind) kv.push(ind, net.params[k], priority=-ind) net.params[k][:] -= easgd_alpha *(net.params[k] - central_weight[k]) net.update(updater=local_updater) i.e. instead of pushing the new updated local parameter value to the server, push the old one ?
I have this few changes for the Elastic SGD to work..The changes are to the python/mxnet/model.py on local and remote updates. The changes look straight forward to me but I may be missing something as the learning I observe is very poor..pasting what I did below:
""" Perform update of param_arrays from central variables on kvstore. Use grad arrays to pull in the center variables"""
`def _update_params_on_kvstore(param_arrays, grad_arrays, kvstore):
for index, pair in enumerate(zip(param_arrays, grad_arrays)):
arg_list, grad_list = pair
if grad_list[0] is None:
continue
# push back the weights
kvstore.push(index, arg_list, priority=-index)
# pull center variables into gradient, priority is negative index
kvstore.pull(index, grad_list, priority=-index)
for p in zip(arg_list, grad_list):
w, g = p
w -= alpha * (w-g)`
""" Perform update of param_arrays not on kvstore.""" def _update_params(param_arrays, grad_arrays, updater, num_device, kvstore=None):
` for index, pair in enumerate(zip(param_arrays, grad_arrays)):
arg_list, grad_list = pair
if grad_list[0] is None:
continue
for k, p in enumerate(zip(arg_list, grad_list)):
# faked an index here, to make optimizer create diff
# state for the same index but on diff devs, TODO(mli)
# use a better solution latter
w, g = p
updater(index*num_device+k, g, w)`
In the main iter loop I do:
_update_params_on_kvstore(executor_manager.param_arrays, executor_manager.grad_arrays, kvstore) executor_manager.forward(is_train=True) executor_manager.backward() _update_params(executor_manager.param_arrays, executor_manager.grad_arrays, updater=updater, num_device=len(ctx), kvstore=kvstore)
I'm interested in EA-SGD also. What work is left to be done here, and where is the latest code for this?
Close it now. Feel free to reopen it.
@sxjscience Why did you close this? I don't see a PR to support EA-SGD. I understand that there is sample code posted here in this github issue. But that's extremely difficult to use for most people. I think it should be supported as a first-class algorithm in MXNet.
@leopd I've closed it because this issue has been inactive for more than 9 months. For now let me reopen it and see if people are interested in supporting EASGD.
@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.
For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.
This is a feature request.
IMHO it's a feature we should add to mxnet.
@sxjscience The EASGD algorithm is not supported in the released versions of MXNet, and I want to implement that in MXNet to check the performance improvement, any suggestion or help. Thank you very much.
Hi All,
Is there support for elastic SGD ?