asyml / texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
https://asyml.io
Apache License 2.0
745 stars 117 forks source link

Bugfix in GPT2Tokenizer #315

Closed gpengzhi closed 4 years ago

gpengzhi commented 4 years ago

resolve #313

gpengzhi commented 4 years ago
from texar.torch.data.tokenizers import GPT2Tokenizer

tokenizer = GPT2Tokenizer(pretrained_model_name='gpt2-small')

example = 'BART is a seq2seq model.'

ids = tokenizer.map_text_to_id(text=example)

print('original text:\n', example)
print('text -> ids -> text:\n', tokenizer.map_id_to_text(ids))
original text:
 BART is a seq2seq model.
text -> ids -> text:
 BART is a seq2seq model.
codecov[bot] commented 4 years ago

Codecov Report

Merging #315 into master will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #315   +/-   ##
=======================================
  Coverage   79.91%   79.91%           
=======================================
  Files         133      133           
  Lines       11135    11135           
=======================================
  Hits         8899     8899           
  Misses       2236     2236           
Impacted Files Coverage Δ
texar/torch/data/tokenizers/gpt2_tokenizer.py 89.36% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0ba18bf...c99708d. Read the comment docs.