Closed yongyi-wu closed 3 years ago
@ZheyuYe Would you help review it also if you have time? Also ping @szhengac to help review if you have time.
I told @yongyi-wu to implement incremental_decode in a follow-up PR and we will focus on the correctness of the forward pass first in this PR.
I am working on convert_t5.py and uploading model weights to S3, as @sxjscience noted previously. I will update here when I'm done.
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1482/t5/index.html
Hi all, thank you for your patience. I've updated the code based on your feedback. At this point, google_t5_small
is up on S3 already, whereas other files should be uploaded shortly. By the way, there are some non-trivial changes in t5.py
as compared to the previous commits, so it would help a lot if you could review it again :D
Merging #1482 (d3346d6) into master (52da8ab) will increase coverage by
0.01%
. The diff coverage is85.51%
.
@@ Coverage Diff @@
## master #1482 +/- ##
==========================================
+ Coverage 85.87% 85.89% +0.01%
==========================================
Files 52 53 +1
Lines 6909 7265 +356
==========================================
+ Hits 5933 6240 +307
- Misses 976 1025 +49
Impacted Files | Coverage Δ | |
---|---|---|
setup.py | 0.00% <ø> (ø) |
|
src/gluonnlp/attention_cell.py | 87.85% <ø> (+0.93%) |
:arrow_up: |
src/gluonnlp/op.py | 95.16% <ø> (ø) |
|
src/gluonnlp/layers.py | 85.87% <74.19%> (-1.25%) |
:arrow_down: |
src/gluonnlp/models/t5.py | 86.54% <86.54%> (ø) |
|
src/gluonnlp/models/__init__.py | 100.00% <100.00%> (ø) |
|
src/gluonnlp/data/vocab.py | 91.83% <0.00%> (+0.68%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 52da8ab...d3346d6. Read the comment docs.
Some other follow-up items:
Looks good to me. A very nice addition! Let's try to convert 3B + 11B when we are fixing the CI. Basically, T5-3B + T5-11B are quite large and they can help test the large-tensor support in MXNet. If we met any issues, they might be related to bugs in MXNet.
T5-3B and T5-11B are good to go. Previous failure was probably due to OOM, while MXNet itself worked well.
Let's also add the test here: https://github.com/dmlc/gluon-nlp/blob/4b061305285f3c4e491b77b046a92c32e306cbc6/tests/test_models.py#L26-L39 We may skip the t5-large, t5-3B, t5-11B and only test for small and base.
@yongyi-wu After a second thought, I think we can skip t5 in test_models.py and just test it in the test_models_t5.py
. We may try to load the t5-small
and t5-base
and do a basic forward. This helps us check if we can correctly load the model.
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1482/t5/index.html
@yongyi-wu After a second thought, I think we can skip t5 in test_models.py and just test it in the
test_models_t5.py
. We may try to load thet5-small
andt5-base
and do a basic forward. This helps us check if we can correctly load the model.
After discussing with @sxjscience, we decided to still test T5 backbones in test_models.py
, but skip T5-3B and T5-11B.
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1482/t5/index.html
Description
This PR contributes the T5 model to the GluonNLP codebase with scripts that could convert a Huggingface T5 model. It also extends functionalities of some modules in layer.py. Finally, this PR enables user to create a T5 tokenizer with control of
<extra_id>
tokens' quantity.Checklist
Essentials
Changes
T5Model
inplementation (forward pass tested)T5Model
with conversion results uploaded to S3SentencepieceTokenizer
instantiationRMSNorm
and Gated-PositionwiseFFN
in layer.pycc @dmlc/gluon-nlp-team