brendano / stanford_corenlp_pywrapper

151 stars 59 forks source link

TypeError: a bytes-like object is required, not 'str' in sockwrap.py #43

Open prakritidev opened 6 years ago

prakritidev commented 6 years ago
Initiating CoreNLP service connection... Downloading data from http://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip
389701632/390211140 [============================>.] - ETA: 0sINFO:CoreNLP_PyWrapper:Starting java subprocess, and waiting for signal it's ready, with command: exec java -Xmx4g -XX:ParallelGCThreads=1 -cp '/Users/prakritidev/anaconda/lib/python3.6/site-packages/stanford_corenlp_pywrapper/lib/*:/Users/prakritidev/Desktop/R-NET-in-Keras-99994adf4dc23ebcc82352403abc4af8d0403c70/lib/stanford-corenlp-full-2017-06-09/*'      corenlp.SocketServer --outpipe /tmp/corenlp_pywrap_pipe_pypid=6027_time=1510646661.6672251  --configdict '{"annotators": "tokenize,ssplit"}'
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
INFO:CoreNLP_JavaServer: CoreNLP pipeline initialized.
INFO:CoreNLP_JavaServer: Waiting for commands on stdin
Traceback (most recent call last):
  File "preprocessing.py", line 82, in <module>
    tokenize = CoreNLP_tokenizer()
  File "preprocessing.py", line 24, in CoreNLP_tokenizer
    corenlp_jars=[path.join(CoreNLP_path(), '*')])
  File "/Users/prakritidev/anaconda/lib/python3.6/site-packages/stanford_corenlp_pywrapper/sockwrap.py", line 151, in __init__
    self.start_server()
  File "/Users/prakritidev/anaconda/lib/python3.6/site-packages/stanford_corenlp_pywrapper/sockwrap.py", line 188, in start_server
    ret = self.send_command_and_parse_result('PING\t""', 2)
  File "/Users/prakritidev/anaconda/lib/python3.6/site-packages/stanford_corenlp_pywrapper/sockwrap.py", line 242, in send_command_and_parse_result
    data = self.send_command_and_get_string_result(cmd, timeout)
  File "/Users/prakritidev/anaconda/lib/python3.6/site-packages/stanford_corenlp_pywrapper/sockwrap.py", line 269, in send_command_and_get_string_result
    self.proc.stdin.write(cmd + "\n")
TypeError: a bytes-like object is required, not 'str'
WARNING:CoreNLP_PyWrapper:Killing subprocess 6071
jackalhan commented 6 years ago

+1

jackalhan commented 6 years ago

@prakritidev In python 3+, str is a default for subprocess, therefore, you need to convert your command to bytes. In order to do that; Go to the sockwrap.py file and make the following changes : at 266 -> sock.sendall(bytes(cmd + "\n", 'utf-8')) at 269 -> self.proc.stdin.write(bytes(cmd + "\n", 'utf-8')) at 275 -> size_info = struct.unpack('>Q', bytes(size_info_str, encoding='utf-8'))[0]

I hope, it helps you and others who need this solution. Thanks