dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

[Tokenizers] Upgrade tokenizers to the latest #1444

Closed barry-jin closed 3 years ago

barry-jin commented 3 years ago

Description

Try to solve #1431

Checklist

Essentials

Changes

Comments

cc @dmlc/gluon-nlp-team

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1444/update_tokenizer/index.html

sxjscience commented 3 years ago

I think the major tests have passed and we may wait https://github.com/dmlc/gluon-nlp/pull/1441

codecov[bot] commented 3 years ago

Codecov Report

Merging #1444 (7bc31c4) into master (bda661b) will decrease coverage by 0.02%. The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1444      +/-   ##
==========================================
- Coverage   85.55%   85.53%   -0.03%     
==========================================
  Files          53       53              
  Lines        6987     6987              
==========================================
- Hits         5978     5976       -2     
- Misses       1009     1011       +2     
Impacted Files Coverage Δ
setup.py 0.00% <ø> (ø)
src/gluonnlp/data/tokenizers/huggingface.py 72.06% <50.00%> (-0.84%) :arrow_down:
src/gluonnlp/data/tokenizers/subword_nmt.py 79.43% <0.00%> (+0.93%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update bda661b...7bc31c4. Read the comment docs.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1444/update_tokenizer/index.html