Closed yongyi-wu closed 3 years ago
Merging #1504 (5d3e194) into master (4624e6b) will decrease coverage by
0.22%
. The diff coverage is63.33%
.
@@ Coverage Diff @@
## master #1504 +/- ##
==========================================
- Coverage 86.59% 86.36% -0.23%
==========================================
Files 54 54
Lines 7349 7388 +39
==========================================
+ Hits 6364 6381 +17
- Misses 985 1007 +22
Impacted Files | Coverage Δ | |
---|---|---|
src/gluonnlp/sequence_sampler.py | 86.77% <ø> (ø) |
|
src/gluonnlp/models/transformer.py | 98.52% <50.00%> (-0.42%) |
:arrow_down: |
src/gluonnlp/models/t5.py | 93.65% <64.81%> (-4.48%) |
:arrow_down: |
src/gluonnlp/data/tokenizers/yttm.py | 81.89% <0.00%> (-0.87%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4624e6b...5d3e194. Read the comment docs.
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1504/88e4c66cebce57950861258690b76f4899344de3/index.html
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1504/5d3e1943e57a50d72b92dc355f0ccc5db3f15531/index.html
Description
This PR introduces a tutorial of using pretrained
T5Inference
model for Masked Language Modeling (MLM) tasks. In order to smoothly handle "out-of-range" tokens, in this case<extra_id>
s, we now subclassSentencepieceTokenizer
to create aT5Tokenizer
and adjust its decoding process. Finally, this PR renamesNMTInference
models (including T5 and transformer) simply intoInference
models per #1501 request.Checklist
Essentials
Changes
mask_to_sentinel()
, useful for MLM tasksT5Tokenizer
classcc @dmlc/gluon-nlp-team