Question: how to get averaged parameters?

rasoolims commented 8 years ago

Hi,

Thanks for the great library.

I was wondering if this library support averaging? I mean, maintaining an averaged parameter that is going to be used in decoding.

Thanks

neubig commented 8 years ago

Hi @rasoolims . I think this feature might be something nice to have, as there have been some reports that this helps stabilize accuracy in a manner similar to ensembling. This could conceivably be done by adding capability to create copies of models, or add models together (with some weighting factor). However, this isn't implemented at the moment. If someone is interested in implementing it we'd be happy to accept a pull request.

honnibal commented 7 years ago

I've found averaging really useful. I've not run nearly as many neural network experiments as you guys, but what I've seen so far is that averaging removes the need for the really expensive part of the training — squeezing out that final 0.5% of accuracy.

It can be implemented very easily by modifying the optimizer. You just add another shadow parameter. I'll send a pull request if I have time, but in case I don't here are some quick notes if the next person needs this and is looking at this issue.

The file to change is training.cc
Look at the way the Adam, Adadelta etc solvers track shadow parameters. You need to add one just like that.
You can either track an exponential moving average, or just sum the weights and divide at the end.
The Google guys have some magic numbers in their SyntaxNet research for this that seem to work very well. I found the description in their paper hard to follow and had to look it up in Tensorflow. It works out like this (apologies for the Cython):


cdef void update_averages(float* ema,
        const float* weights, int nr_weight, float nr_update) nogil:
    decay = (1.0 + nr_update) / (10.0 + nr_update)
    if decay > 0.9999:
        decay = 0.9999
    for i in range(nr_weight):
        ema[i] -= (1-decay) * (ema[i] - weights[i])

neubig commented 7 years ago

We would certainly really appreciate a pull request for this, where averaging is an option that we pass to each optimizer.

clab / dynet

Question: how to get averaged parameters? #73