coccoc / coccoc-tokenizer

high performance tokenizer for Vietnamese language
GNU Lesser General Public License v3.0
394 stars 125 forks source link

How to config options when using Python binding #10

Closed KienPM closed 4 years ago

KienPM commented 4 years ago

This is a great project! Can you please provide document guiding how to use options such as -t, -u... when using Python binding? Thank you so much!

bachan commented 4 years ago

Python bindinings code doesn't define needed constants at the moment, so a quick solution for -u would be:

from CocCocTokenizer import PyTokenizer
t = PyTokenizer()
res = t.word_tokenize(text, 2)

And for -n the quick solution would be:

from CocCocTokenizer import PyTokenizer
t = PyTokenizer(False)
res = t.word_tokenize(text)
bachan commented 4 years ago

For more proper solution feel free to fix python extension code (.pyx file) and send us a pull request, we will happily review and include it.

KienPM commented 4 years ago

Thanks for your answer!