dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

[Tokenizer] Upgrade tokenizers to the latest #1431

Closed sxjscience closed 3 years ago

sxjscience commented 3 years ago

Description

Currently we are fixing the tokenizers package to 0.8.1: https://github.com/dmlc/gluon-nlp/blob/84391ef90b5926ff2186c1931cfb460b7fc3785e/setup.py#L47

However, the huggingface/tokenizers have been recently upgraded to 0.9.4. Thus, we may consider to support the latest version.

References

sxjscience commented 3 years ago

The related code base is here: https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/data/tokenizers/huggingface.py

barry-jin commented 3 years ago

Closed via #1444