Lynten / stanford-corenlp

Python wrapper for Stanford CoreNLP.
MIT License
919 stars 200 forks source link

Unable to print word_tokenize Chinese word #106

Open PolarisRisingWar opened 2 years ago

PolarisRisingWar commented 2 years ago

I used the example code:

from stanfordcorenlp import StanfordCoreNLP

# Other human languages support, e.g. Chinese
sentence = '清华大学位于北京。'

with StanfordCoreNLP(r'install_packages/stanford-corenlp-full-2016-10-31', lang='zh') as nlp:
    print(nlp.word_tokenize(sentence))
    print(nlp.pos_tag(sentence))
    print(nlp.ner(sentence))
    print(nlp.parse(sentence))
    print(nlp.dependency_parse(sentence))

And my output is:

['', '', '', '', '']
[('', 'NR'), ('', 'NN'), ('', 'VV'), ('', 'NR'), ('', 'PU')]
[('', 'ORGANIZATION'), ('', 'ORGANIZATION'), ('', 'O'), ('', 'GPE'), ('', 'O')]
(ROOT
  (IP
    (NP (NR 清华) (NN 大学))
    (VP (VV 位于)
      (NP (NR 北京)))
    (PU 。)))
[('ROOT', 0, 3), ('compound:nn', 2, 1), ('nsubj', 3, 2), ('dobj', 3, 4), ('punct', 3, 5)]

It seems that parse code is able to print correct result, but other codes can't. I don't know why this happened.

yuran986 commented 1 month ago

May I ask how long did you take to generate the results? I ran the same code as yours but it just ran for 20 min and never print anything.