Open jnory opened 4 years ago
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please visit https://cla.developers.google.com/ to sign.
Once you've signed (or fixed any issues), please reply here with @googlebot I signed it!
and we'll verify it.
ℹ️ Googlers: Go here for more info.
@googlebot I signed it!
Hi,
I noticed that
create_pretraining_data.py
aborts by the error:The reason why the error occurs is that the variable
piece
may be abytes
type in Python 3.I'm using sentencepiece tokenizer, and, the minimal case of the input text is following (the text comes from wikipedia):
This small PR fixes the problem by ensuring
str
type for thepiece
. Please let me know if you notice anything.Sincerely,