clab / dynet

DyNet: The Dynamic Neural Network Toolkit
Apache License 2.0
3.42k stars 703 forks source link

Question: how to get averaged parameters? #73

Open rasoolims opened 8 years ago

rasoolims commented 8 years ago

Hi,

Thanks for the great library.

I was wondering if this library support averaging? I mean, maintaining an averaged parameter that is going to be used in decoding.

Thanks

neubig commented 8 years ago

Hi @rasoolims . I think this feature might be something nice to have, as there have been some reports that this helps stabilize accuracy in a manner similar to ensembling. This could conceivably be done by adding capability to create copies of models, or add models together (with some weighting factor). However, this isn't implemented at the moment. If someone is interested in implementing it we'd be happy to accept a pull request.

honnibal commented 7 years ago

I've found averaging really useful. I've not run nearly as many neural network experiments as you guys, but what I've seen so far is that averaging removes the need for the really expensive part of the training — squeezing out that final 0.5% of accuracy.

It can be implemented very easily by modifying the optimizer. You just add another shadow parameter. I'll send a pull request if I have time, but in case I don't here are some quick notes if the next person needs this and is looking at this issue.


cdef void update_averages(float* ema,
        const float* weights, int nr_weight, float nr_update) nogil:
    decay = (1.0 + nr_update) / (10.0 + nr_update)
    if decay > 0.9999:
        decay = 0.9999
    for i in range(nr_weight):
        ema[i] -= (1-decay) * (ema[i] - weights[i])
neubig commented 7 years ago

We would certainly really appreciate a pull request for this, where averaging is an option that we pass to each optimizer.