Closed TimurNurlygayanov closed 4 years ago
Note: I've used the code from the README example.
Have you tried pip install --upgrade pip
and pip install MeCab
?
Isn't it better to use SimpleTokenizer
instead of MeCabTokenizer
?
MeCab
is a library for morphological analysis of Japanese natural sentences. If you want to analyze Japanese, you should use this library after installing MeCab
, but your target seems to be English natural sentences.
MeCabTokenizer
is a TokenizableDoc
to tokenize Japanese words, while SimpleTokenizer
is a TokenizableDoc
to tokenize mainly English words.(I've only tested it in English and Japanese.)
Let's change
from pysummarization.tokenizabledoc.mecab_tokenizer import MeCabTokenizer
to
from pysummarization.tokenizabledoc.simple_tokenizer import SimpleTokenizer
and
nlp_base.tokenizable_doc = MeCabTokenizer()
to
nlp_base.tokenizable_doc = SimpleTokenizer()
It doesn't seem to be a critical issue. I'll close it for now.
Hi, thank you for the great library!
It looks like I've found some issue, here is my example of code:
And result of its execution with Python 3.7.4: