yongyi-wu commented 3 years ago

Description

This PR contributes the T5 model to the GluonNLP codebase with scripts that could convert a Huggingface T5 model. It also extends functionalities of some modules in layer.py. Finally, this PR enables user to create a T5 tokenizer with control of <extra_id> tokens' quantity.

Checklist

Essentials

[x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
[x] All changes have test coverage

Changes

[x] T5Model inplementation (forward pass tested)
[x] Conversion scripts for Huggingface T5Model with conversion results uploaded to S3
[x] Improved SentencepieceTokenizer instantiation
[x] RMSNorm and Gated-PositionwiseFFN in layer.py
[x] Bug and typo fixes

cc @dmlc/gluon-nlp-team

sxjscience commented 3 years ago

@ZheyuYe Would you help review it also if you have time? Also ping @szhengac to help review if you have time.

sxjscience commented 3 years ago

I told @yongyi-wu to implement incremental_decode in a follow-up PR and we will focus on the correctness of the forward pass first in this PR.

yongyi-wu commented 3 years ago

I am working on convert_t5.py and uploading model weights to S3, as @sxjscience noted previously. I will update here when I'm done.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1482/t5/index.html

yongyi-wu commented 3 years ago

Hi all, thank you for your patience. I've updated the code based on your feedback. At this point, google_t5_small is up on S3 already, whereas other files should be uploaded shortly. By the way, there are some non-trivial changes in t5.py as compared to the previous commits, so it would help a lot if you could review it again :D

codecov[bot] commented 3 years ago

Codecov Report

Merging #1482 (d3346d6) into master (52da8ab) will increase coverage by 0.01%. The diff coverage is 85.51%.

@@            Coverage Diff             @@
##           master    #1482      +/-   ##
==========================================
+ Coverage   85.87%   85.89%   +0.01%     
==========================================
  Files          52       53       +1     
  Lines        6909     7265     +356     
==========================================
+ Hits         5933     6240     +307     
- Misses        976     1025      +49

Impacted Files	Coverage Δ
setup.py	`0.00% <ø> (ø)`
src/gluonnlp/attention_cell.py	`87.85% <ø> (+0.93%)`	:arrow_up:
src/gluonnlp/op.py	`95.16% <ø> (ø)`
src/gluonnlp/layers.py	`85.87% <74.19%> (-1.25%)`	:arrow_down:
src/gluonnlp/models/t5.py	`86.54% <86.54%> (ø)`
src/gluonnlp/models/__init__.py	`100.00% <100.00%> (ø)`
src/gluonnlp/data/vocab.py	`91.83% <0.00%> (+0.68%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 52da8ab...d3346d6. Read the comment docs.

sxjscience commented 3 years ago

Some other follow-up items:

Convert the T5ConditionalGenerator so that we can directly use it to fill in missing values in the input sentence
Add incremental_decode to support BeamSearch and fast inference.

yongyi-wu commented 3 years ago

Looks good to me. A very nice addition! Let's try to convert 3B + 11B when we are fixing the CI. Basically, T5-3B + T5-11B are quite large and they can help test the large-tensor support in MXNet. If we met any issues, they might be related to bugs in MXNet.

T5-3B and T5-11B are good to go. Previous failure was probably due to OOM, while MXNet itself worked well.

sxjscience commented 3 years ago

Let's also add the test here: https://github.com/dmlc/gluon-nlp/blob/4b061305285f3c4e491b77b046a92c32e306cbc6/tests/test_models.py#L26-L39 We may skip the t5-large, t5-3B, t5-11B and only test for small and base.

sxjscience commented 3 years ago

@yongyi-wu After a second thought, I think we can skip t5 in test_models.py and just test it in the test_models_t5.py. We may try to load the t5-small and t5-base and do a basic forward. This helps us check if we can correctly load the model.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1482/t5/index.html

yongyi-wu commented 3 years ago

@yongyi-wu After a second thought, I think we can skip t5 in test_models.py and just test it in the test_models_t5.py. We may try to load the t5-small and t5-base and do a basic forward. This helps us check if we can correctly load the model.

After discussing with @sxjscience, we decided to still test T5 backbones in test_models.py, but skip T5-3B and T5-11B.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1482/t5/index.html

dmlc / gluon-nlp

[FEATURE] Add T5Model and T5 conversion scripts; Adjust T5Tokenizer instantiation #1482

Description

Checklist

Essentials

Changes

Codecov Report