sxjscience commented 3 years ago

Description

Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.

@yongyi-wu

Checklist

Essentials

[ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
[ ] Changes are complete (i.e. I finished coding on this PR)
[ ] All changes have test coverage
[ ] Code is well-documented

Changes

[ ] Feature1, tests, (and when applicable, API doc)
[ ] Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

codecov[bot] commented 3 years ago

Codecov Report

Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by 3.21%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1488      +/-   ##
==========================================
+ Coverage   81.98%   85.19%   +3.21%     
==========================================
  Files          52       52              
  Lines        6909     6822      -87     
==========================================
+ Hits         5664     5812     +148     
+ Misses       1245     1010     -235

Impacted Files	Coverage Δ
src/gluonnlp/data/batchify.py	`88.72% <ø> (ø)`
src/gluonnlp/layers.py	`87.15% <100.00%> (+0.03%)`	:arrow_up:
src/gluonnlp/models/transformer.py	`98.93% <100.00%> (-0.01%)`	:arrow_down:
conftest.py	`76.31% <0.00%> (-8.69%)`	:arrow_down:
src/gluonnlp/data/loading.py	`75.75% <0.00%> (-7.64%)`	:arrow_down:
src/gluonnlp/utils/lazy_imports.py	`58.42% <0.00%> (-2.25%)`	:arrow_down:
src/gluonnlp/utils/misc.py	`52.51% <0.00%> (-1.06%)`	:arrow_down:
src/gluonnlp/data/tokenizers/yttm.py	`81.73% <0.00%> (-1.02%)`	:arrow_down:
src/gluonnlp/data/tokenizers/spacy.py	`65.33% <0.00%> (-0.91%)`	:arrow_down:
src/gluonnlp/data/tokenizers/huggingface.py	`71.06% <0.00%> (-0.78%)`	:arrow_down:
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.

yongyi-wu commented 3 years ago

Looks good— it seems all issues related to pre-norm and skip connection have been fixed.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1488/fix_pre_ln/index.html