dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

fix BERT phase2 training #1393

Closed ZiyueHuang closed 3 years ago

ZiyueHuang commented 3 years ago

Description

Remove LRScheduler. Since the trainer will dump/load the optimizer (including the LRScheduler), if we use LRScheduler, the learning rate will always be 0 in the phase2 training.

Remove cfg.MODEL.max_length = max_seq_length. Since cfg.MODEL.max_length is used for the positional embedding and should be 512 (the maximum seq_length of the two phases).

I have trained the BERT base model and finetune on SQUAD 2.0, obtaining 77.89/74.72, which is better than the standard result 76.43/73.59 (https://github.com/dmlc/gluon-nlp/tree/master/scripts/question_answering), since I used the SOP objective instead of NSP (by setting random_next_sentence to False) suggested by @sxjscience .

Attached file, bert_base_logs.zip, contains the pretraining logs and the finetune squad logs.

Checklist

Essentials

Changes

Comments

cc @dmlc/gluon-nlp-team

codecov[bot] commented 3 years ago

Codecov Report

Merging #1393 into master will increase coverage by 0.11%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1393      +/-   ##
==========================================
+ Coverage   85.12%   85.24%   +0.11%     
==========================================
  Files          53       53              
  Lines        6959     6959              
==========================================
+ Hits         5924     5932       +8     
+ Misses       1035     1027       -8     
Impacted Files Coverage Δ
src/gluonnlp/data/filtering.py 78.26% <0.00%> (-4.35%) :arrow_down:
src/gluonnlp/data/tokenizers/subword_nmt.py 78.50% <0.00%> (-0.94%) :arrow_down:
src/gluonnlp/data/loading.py 83.39% <0.00%> (+5.28%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8ef4b26...d453ae4. Read the comment docs.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1393/fix_bert_phase2/index.html