THUNLP-MT / THUMT

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group
BSD 3-Clause "New" or "Revised" License
703 stars 197 forks source link

Remove unnecessary decoding #64

Closed EFanZh closed 5 years ago

EFanZh commented 5 years ago

The decode call seems unnecessary.

Glaceon31 commented 5 years ago

That may be useful on certain languages or certain input encodings.

EFanZh commented 5 years ago

The original source code specifies UTF-8 encoding on both sides of the equation sign directly, so if the decoded strings are equal, the original bytes must be equal.

Another reason for doing this is that symbol and token might be str in Python 3, which does not have the decode method.

For example, it seems possible that the symbol and token values are come from here:

https://github.com/THUNLP-MT/THUMT/blob/aea0d89e45df64e7532a424e5158d7cfa1692de9/thumt/models/rnnsearch.py#L400-L403

A double quoted string literal is of type str, which does not have decode method.

Glaceon31 commented 5 years ago

We found that this one is useful. Thanks for your contribution!