apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.79k stars 6.79k forks source link

elastic SGD #2710

Open amithr1 opened 8 years ago

amithr1 commented 8 years ago

Hi All,

Is there support for elastic SGD ?

Godricly commented 8 years ago

The EASGD?

sxjscience commented 8 years ago

@amithr1 Do you mean Elastic Averaging SGD (EASGD)? We can implement that using the current KVStore API and low-level executor APIs. @peterzcc and I have done this but it's not ready for a PR.

Godricly commented 8 years ago

@sxjscience I did it too, but the solution is not nice. Can you PR one later?

1657 is my experience on implementation. Maybe it would be helpful for you,

sxjscience commented 8 years ago

@Godricly We have written a separate server optimizer for EASGD (https://github.com/sxjscience/mxnet/blob/master/python/mxnet/optimizer.py#L849-L867). The local workers will update the weights using the local optimizer and communicate with the server using pull, push primitives, which will be handled by the server optimizer. According to our experiments, EASGD is more stable than DownpourSGD (We can communicate in larger steps). However, adding EASGD support to high-level API requires additional works (performance comparisons + compatibility checks). We haven't found time to do this.

Godricly commented 8 years ago

wow...thats much better than what i did in messing up model.py with kvstore. So each local work is a node in EASGD?

sxjscience commented 8 years ago

@Godricly Yes. The workflow looks like this.

server_optimizer = mx.optimizer.create(name="ServerEASGD", learning_rate=easgd_alpha)
kv.set_optimizer(server_optimizer)
local_updater = mx.optimizer.get_updater(local_optimizer)
while NOT_CONVERGE:
   net.forward(is_train=True, data=data)
   net.backward(label=label)
   if total_steps % kvstore_update_period == 0:
       for ind, k in enumerate(net.params.keys()):
           kv.pull(ind, central_weight[k], priority=-ind)
           net.params[k][:] -= easgd_alpha *(net.params[k] - central_weight[k])
           kv.push(ind, net.params[k], priority=-ind)
           net.update(updater=local_updater)
Godricly commented 8 years ago

Cool. Is this one based on module interface?

sxjscience commented 8 years ago

It's mainly based on the low-level executor. I've written a simple wrapper for that, which is similar to the module interface.

amithr1 commented 8 years ago

@sxjscience Shouldn't you do: for ind, k in enumerate(net.params.keys()): kv.pull(ind, central_weight[k], priority=-ind) kv.push(ind, net.params[k], priority=-ind) net.params[k][:] -= easgd_alpha *(net.params[k] - central_weight[k]) net.update(updater=local_updater) i.e. instead of pushing the new updated local parameter value to the server, push the old one ?

amithr1 commented 8 years ago

I have this few changes for the Elastic SGD to work..The changes are to the python/mxnet/model.py on local and remote updates. The changes look straight forward to me but I may be missing something as the learning I observe is very poor..pasting what I did below:

""" Perform update of param_arrays from central variables on kvstore. Use grad arrays to pull in the center variables"""

`def _update_params_on_kvstore(param_arrays, grad_arrays, kvstore):

   for index, pair in enumerate(zip(param_arrays, grad_arrays)):
    arg_list, grad_list = pair
    if grad_list[0] is None:
        continue
    # push back the weights
    kvstore.push(index, arg_list, priority=-index)
    # pull center variables into gradient, priority is negative index
    kvstore.pull(index, grad_list, priority=-index)
    for p in zip(arg_list, grad_list):
        w, g = p
         w -= alpha * (w-g)`

""" Perform update of param_arrays not on kvstore.""" def _update_params(param_arrays, grad_arrays, updater, num_device, kvstore=None):

` for index, pair in enumerate(zip(param_arrays, grad_arrays)):

    arg_list, grad_list = pair
    if grad_list[0] is None:
        continue
    for k, p in enumerate(zip(arg_list, grad_list)):
        # faked an index here, to make optimizer create diff
        # state for the same index but on diff devs, TODO(mli)
        # use a better solution latter
        w, g = p
        updater(index*num_device+k, g, w)`

In the main iter loop I do:

_update_params_on_kvstore(executor_manager.param_arrays, executor_manager.grad_arrays, kvstore) executor_manager.forward(is_train=True) executor_manager.backward() _update_params(executor_manager.param_arrays, executor_manager.grad_arrays, updater=updater, num_device=len(ctx), kvstore=kvstore)

leopd commented 7 years ago

I'm interested in EA-SGD also. What work is left to be done here, and where is the latest code for this?

sxjscience commented 7 years ago

Close it now. Feel free to reopen it.

leopd commented 7 years ago

@sxjscience Why did you close this? I don't see a PR to support EA-SGD. I understand that there is sample code posted here in this github issue. But that's extremely difficult to use for most people. I think it should be supported as a first-class algorithm in MXNet.

sxjscience commented 7 years ago

@leopd I've closed it because this issue has been inactive for more than 9 months. For now let me reopen it and see if people are interested in supporting EASGD.

szha commented 6 years ago

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

leopd commented 6 years ago

This is a feature request.

IMHO it's a feature we should add to mxnet.

CynthiaProtector commented 5 years ago

@sxjscience The EASGD algorithm is not supported in the released versions of MXNet, and I want to implement that in MXNet to check the performance improvement, any suggestion or help. Thank you very much.