google-research / t5x

Apache License 2.0
2.63k stars 299 forks source link

customize tokenizer #1038

Open yuanxiaoyu1 opened 1 year ago

yuanxiaoyu1 commented 1 year ago

is it possible to customize tokenizer? it is very appreciated for any body to give me an example. I tried to debug the code, however, it's wrapped wrapped and wrapped.., can't find any code really called sentence piece tokenizer

yuanxiaoyu1 commented 1 year ago

the tokenizer tokenize text use regular expression, it seems tensorflow text have a regex_split, but not matched to t5x