When running the Wrapper, I got the following error:
[Server] Started socket server on port 12340
INFO:StanfordSocketWrap:Successful ping. The server has started.
INFO:StanfordSocketWrap:Subprocess is ready.
Adding Segmentation annotation ... INFO: TagAffixDetector: useChPos=false | useCTBChar2=true | usePKChar2=false
INFO: TagAffixDetector: building TagAffixDetector from edu/stanford/nlp/models/segmenter/chinese/dict/character_list and edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb
Loading character dictionary file from edu/stanford/nlp/models/segmenter/chinese/dict/character_list
Loading affix dictionary from edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb
你爱我吗?
--->
[你, 爱, 我, 吗, ?]
java.lang.RuntimeException: don't know how to handle annotator segment
at corenlp.JsonPipeline.addAnnoToSentenceObject(JsonPipeline.java:282)
at corenlp.JsonPipeline.processTextDocument(JsonPipeline.java:312)
at corenlp.SocketServer.runCommand(SocketServer.java:140)
at corenlp.SocketServer.socketServerLoop(SocketServer.java:194)
at corenlp.SocketServer.main(SocketServer.java:107)
Any idea why this is happening? Many thanks in advance!
the wrapper doesnt support it -- you'd have to modify the java code where the error is happening, to add in the segmentation information to the json output.
Hi! I wonder if anyone has used the Wrapper to parse Chinese texts before? I have the following code:
from stanford_corenlp_pywrapper import sockwrap
parser_path = "/Users/hbyan2/Downloads/stanford-corenlp-full-2015-04-20/*" cn_model_path = "/Users/hbyan2/Downloads/stanford-corenlp-full-2015-04-20/stanford-chinese-corenlp-2015-04-20-models.jar"
p = sockwrap.SockWrap( configdict={ 'annotators':"segment, ssplit, pos, parse", 'customAnnotatorClass.segment': 'edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator', 'segment.model': 'edu/stanford/nlp/models/segmenter/chinese/ctb.gz', 'segment.sighanCorporaDict': 'edu/stanford/nlp/models/segmenter/chinese', 'segment.serDictionary': 'edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz', 'segment.sighanPostProcessing': True, 'ssplit.boundaryTokenRegex': '[.]|[!?]+|[。]|[!?]+', "parse.model": "edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz", "pos.model": "edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger" },
corenlp_jars=[parser_path, cn_model_path] )
p.parse_doc(u"你爱我吗?")
The configs are taken from the default CoreNLP properties for parsing Chinese: https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties
When running the Wrapper, I got the following error:
[Server] Started socket server on port 12340 INFO:StanfordSocketWrap:Successful ping. The server has started. INFO:StanfordSocketWrap:Subprocess is ready. Adding Segmentation annotation ... INFO: TagAffixDetector: useChPos=false | useCTBChar2=true | usePKChar2=false INFO: TagAffixDetector: building TagAffixDetector from edu/stanford/nlp/models/segmenter/chinese/dict/character_list and edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb Loading character dictionary file from edu/stanford/nlp/models/segmenter/chinese/dict/character_list Loading affix dictionary from edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb 你爱我吗? ---> [你, 爱, 我, 吗, ?] java.lang.RuntimeException: don't know how to handle annotator segment at corenlp.JsonPipeline.addAnnoToSentenceObject(JsonPipeline.java:282) at corenlp.JsonPipeline.processTextDocument(JsonPipeline.java:312) at corenlp.SocketServer.runCommand(SocketServer.java:140) at corenlp.SocketServer.socketServerLoop(SocketServer.java:194) at corenlp.SocketServer.main(SocketServer.java:107)
Any idea why this is happening? Many thanks in advance!