asyml / texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
https://asyml.io
Apache License 2.0
745 stars 117 forks source link

Add BPETokenizer #204

Open gpengzhi opened 5 years ago

gpengzhi commented 5 years ago

There are some subtle differences between BPE implementation in sentencepiece and BPE implementation in subword-nmt. We could probably delete everthing except multi-bleu.perl in texar-pytorch/bin/utils after this one is implemented. Transformer example could be simplified as well. Related issue #180