Closed ZiyueHuang closed 4 years ago
LGTM. Need to revise the usage of nd.reshape
--> npx.reshape
.
Merging #1319 into master will decrease coverage by
0.09%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #1319 +/- ##
==========================================
- Coverage 84.45% 84.36% -0.10%
==========================================
Files 42 42
Lines 6426 6426
==========================================
- Hits 5427 5421 -6
- Misses 999 1005 +6
Impacted Files | Coverage Δ | |
---|---|---|
src/gluonnlp/data/loading.py | 81.13% <0.00%> (-2.27%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 970318d...f2c8df5. Read the comment docs.
I've made some change. Could you please review again?
Description
For each worker, we should normalize the loss by the number of elements to improve numerical stability and can avoid setting
step_size
in trainer.update(), and this is consistent with the official TF implementation, see https://github.com/google-research/electra/blob/master/run_pretraining.py#L181 and https://github.com/google-research/electra/blob/master/run_pretraining.py#L206.To check the correctness of the Horovod support, let k denote the number of workers,
Ground truth:
In our code
And this part of implementation for Horovod support is consistent with BERT script, see https://github.com/dmlc/gluon-nlp/blob/v0.10.x/scripts/bert/run_pretraining.py.
@sxjscience Could you also check the correctness?
Also, we need re-train the model and adjust the hyper-parameters accordingly.
Checklist
Essentials
Changes
Comments
cc @dmlc/gluon-nlp-team