dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

[Preprocess][Tokenizer] Improve learn subword #1460

Closed sxjscience closed 3 years ago

sxjscience commented 3 years ago

Description

Solve issue https://github.com/dmlc/gluon-nlp/issues/1451

Checklist

Essentials

cc @dmlc/gluon-nlp-team

sxjscience commented 3 years ago

@araitats Should be able to solve your issue after this gets merged.

codecov[bot] commented 3 years ago

Codecov Report

Merging #1460 (982dea1) into master (8cd1300) will increase coverage by 0.01%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1460      +/-   ##
==========================================
+ Coverage   85.79%   85.80%   +0.01%     
==========================================
  Files          52       52              
  Lines        6855     6855              
==========================================
+ Hits         5881     5882       +1     
+ Misses        974      973       -1     
Impacted Files Coverage Δ
src/gluonnlp/data/tokenizers/subword_nmt.py 79.43% <0.00%> (+0.93%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8cd1300...982dea1. Read the comment docs.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1460/improve_learn_subword/index.html