dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

fix https://github.com/dmlc/gluon-nlp/issues/1315 #1319

Closed ZiyueHuang closed 4 years ago

ZiyueHuang commented 4 years ago

Description

For each worker, we should normalize the loss by the number of elements to improve numerical stability and can avoid setting step_size in trainer.update(), and this is consistent with the official TF implementation, see https://github.com/google-research/electra/blob/master/run_pretraining.py#L181 and https://github.com/google-research/electra/blob/master/run_pretraining.py#L206.

To check the correctness of the Horovod support, let k denote the number of workers,

Ground truth:

g = \frac{1}{k} \sum_{i=1}{k} g_i

update_gt = clip_grad_global_norm(g, max_norm)

In our code

G = \sum_{i=1}{k} g_i     (by trainer.allreduce_grads())

update = \frac{1}{k} * clip_grad_global_norm(G, max_norm * k)

         (note that \frac{1}{k} is the scaling factor inside hvd.trainer)

       = \frac{1}{k} * clip_grad_global_norm(g * k, max_norm * k)

       = \frac{1}{k} * k * clip_grad_global_norm(g, max_norm)

       = update_gt

And this part of implementation for Horovod support is consistent with BERT script, see https://github.com/dmlc/gluon-nlp/blob/v0.10.x/scripts/bert/run_pretraining.py.

@sxjscience Could you also check the correctness?

Also, we need re-train the model and adjust the hyper-parameters accordingly.

Checklist

Essentials

Changes

Comments

cc @dmlc/gluon-nlp-team

sxjscience commented 4 years ago

LGTM. Need to revise the usage of nd.reshape --> npx.reshape.

codecov[bot] commented 4 years ago

Codecov Report

Merging #1319 into master will decrease coverage by 0.09%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1319      +/-   ##
==========================================
- Coverage   84.45%   84.36%   -0.10%     
==========================================
  Files          42       42              
  Lines        6426     6426              
==========================================
- Hits         5427     5421       -6     
- Misses        999     1005       +6     
Impacted Files Coverage Δ
src/gluonnlp/data/loading.py 81.13% <0.00%> (-2.27%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 970318d...f2c8df5. Read the comment docs.

ZiyueHuang commented 4 years ago

I've made some change. Could you please review again?