GlobalMaksimum / sadedegel

A General Purpose NLP library for Turkish
http://sadedegel.ai
MIT License
92 stars 15 forks source link

Test Cases for Tokenizer Handlers [resolves #273] #274

Closed ertugrul-dmr closed 2 years ago

ertugrul-dmr commented 3 years ago

Files changed:

tests/test_tokenizer_flags.py:

To-do:

Going to add test cases for Text2Doc with @onatyap and enrich test cases for different options.

husnusensoy commented 3 years ago

You are calling a private function (_tokenize) which you shouldn't when you use WordTokenizer class (or subclass) as an user because it is there for internal usage (you can of course can use when you understand the consequences) What you need is

it = toker(mention=True)
tokens_pred = it(text) # calling __call__ function on WordTokenizer class
husnusensoy commented 3 years ago

Please do check my commit :) I have already done.

ertugrul-dmr commented 3 years ago

Yeah, thank you for update. What I did was just little fixes where I missed in the first place :)

husnusensoy commented 3 years ago

can you ensure that Text2Doc also works properly with the exception rules and close the pull request if it is not required.

ertugrul-dmr commented 3 years ago

Self note: Going to fix issue with Text2Doc where it's not updating hashtag, mention etc. after first call; then will add test cases. Both steps under this PR.