ZiyueHuang commented 3 years ago

Description

Remove LRScheduler. Since the trainer will dump/load the optimizer (including the LRScheduler), if we use LRScheduler, the learning rate will always be 0 in the phase2 training.

Remove cfg.MODEL.max_length = max_seq_length. Since cfg.MODEL.max_length is used for the positional embedding and should be 512 (the maximum seq_length of the two phases).

I have trained the BERT base model and finetune on SQUAD 2.0, obtaining 77.89/74.72, which is better than the standard result 76.43/73.59 (https://github.com/dmlc/gluon-nlp/tree/master/scripts/question_answering), since I used the SOP objective instead of NSP (by setting random_next_sentence to False) suggested by @sxjscience .

Attached file, bert_base_logs.zip, contains the pretraining logs and the finetune squad logs.

Checklist

Essentials

[ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
[ ] Changes are complete (i.e. I finished coding on this PR)
[ ] All changes have test coverage
[ ] Code is well-documented

Changes

[ ] Feature1, tests, (and when applicable, API doc)
[ ] Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

codecov[bot] commented 3 years ago

Codecov Report

Merging #1393 into master will increase coverage by 0.11%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1393      +/-   ##
==========================================
+ Coverage   85.12%   85.24%   +0.11%     
==========================================
  Files          53       53              
  Lines        6959     6959              
==========================================
+ Hits         5924     5932       +8     
+ Misses       1035     1027       -8

Impacted Files	Coverage Δ
src/gluonnlp/data/filtering.py	`78.26% <0.00%> (-4.35%)`	:arrow_down:
src/gluonnlp/data/tokenizers/subword_nmt.py	`78.50% <0.00%> (-0.94%)`	:arrow_down:
src/gluonnlp/data/loading.py	`83.39% <0.00%> (+5.28%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8ef4b26...d453ae4. Read the comment docs.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1393/fix_bert_phase2/index.html

dmlc / gluon-nlp

fix BERT phase2 training #1393