Open Olamyy opened 4 years ago
Thinking about including a tokenizer class in the project. I'm thinking the API could look like:
from iranlowo.tokenizer import Tokenizer text = "some text" word_tokens = Tokenizer(text).word_tokenize() sentence_tokens = Tokenizer(text).sentence_tokenize()
I think the API looks good, it would simplify the requirements.txt which currently for ADR depend on NLTK etc for tokenization.
requirements.txt
Thinking about including a tokenizer class in the project. I'm thinking the API could look like: