dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Open sxjscience opened 3 years ago

sxjscience commented 3 years ago

Description

Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.

@yongyi-wu

Checklist

Essentials

Changes

Comments

cc @dmlc/gluon-nlp-team

codecov[bot] commented 3 years ago

Codecov Report

Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by 3.21%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1488      +/-   ##
==========================================
+ Coverage   81.98%   85.19%   +3.21%     
==========================================
  Files          52       52              
  Lines        6909     6822      -87     
==========================================
+ Hits         5664     5812     +148     
+ Misses       1245     1010     -235     
Impacted Files Coverage Δ
src/gluonnlp/data/batchify.py 88.72% <ø> (ø)
src/gluonnlp/layers.py 87.15% <100.00%> (+0.03%) :arrow_up:
src/gluonnlp/models/transformer.py 98.93% <100.00%> (-0.01%) :arrow_down:
conftest.py 76.31% <0.00%> (-8.69%) :arrow_down:
src/gluonnlp/data/loading.py 75.75% <0.00%> (-7.64%) :arrow_down:
src/gluonnlp/utils/lazy_imports.py 58.42% <0.00%> (-2.25%) :arrow_down:
src/gluonnlp/utils/misc.py 52.51% <0.00%> (-1.06%) :arrow_down:
src/gluonnlp/data/tokenizers/yttm.py 81.73% <0.00%> (-1.02%) :arrow_down:
src/gluonnlp/data/tokenizers/spacy.py 65.33% <0.00%> (-0.91%) :arrow_down:
src/gluonnlp/data/tokenizers/huggingface.py 71.06% <0.00%> (-0.78%) :arrow_down:
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.

yongyi-wu commented 3 years ago

Looks good— it seems all issues related to pre-norm and skip connection have been fixed.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1488/fix_pre_ln/index.html

sxjscience commented 3 years ago

I noticed that the performance becomes worse after I changed the implementation. Still investigating the issue.