Open PiJoules opened 10 years ago
One workaround for this I found was to add a line like:
line = line.decode('ascii', 'ignore')
Which converts your line into ASCII (in your case, PiJoules, replace line
with text2
).
The parser raises an RPC internal error -32603 whenever I try to parse this sentence:
"They are now examining whether Ahmed drove one of the vehicles used in the Dar es Salaam bombing and whether the 400 pounds of explosives used in both blasts came into Tanzania in a shipment of rice imported by one of his companies."
When I splitted the sentence in parts and tried to parse each part, I figured out that the bi-gram "400 pounds" is causing that error. The parser is unable to parse it.
The problem is not solved when I encode my sentence in ascii. Does anyone know how to fix this issue?
It also happens when I try to parse Vivendi shares closed 1.9 percent at 15.80 euros in Paris after falling 3.6 percent on Monday. and euros causes the error.
@rgtjf I fixed the problem when I tracked the exchanged data between all the functions until I found where the error came from. The parser in corenlp.py returns a parsing result in a json format. In my sentence, it replaced the word "pounds" with its symbol "£". Later, in the same script, the function parse_parser_results(text) (line67) is unable to read that symbol. In your case I guess "euros" is converted to its symbol "€" which causes the problem. You can print the "text" data just after line 67 to see the parsing. To solve the problem, I converted the text data in parse_parser_results to unicode with utf-8 encoding. In fine, add : text=unicode(text,"utf-8") before the loop (line 75) in parse_parser_results(text) (line67) of corenlp.py and it will work.
@maali-mnasri Great! Thank you.
Thank you @maali-mnasri
Thank you @maali-mnasri , I met the same problem
Thank you @maali-mnasri
I am trying to parse the sentence
WASHINGTON — Republicans on Thursday vowed a swift and forceful response to the executive action on immigration that President Obama is to announce in a prime-time address, accusing the president of exceeding the power of his office and promising a legislative fight when they take full control of Congress next year.
but I keep getting the error
The error doesn't appear though when I remove the EM Dash (
—
) in the first sentence. The same goes for curly single and double quotes like“”
. Is there any way I can still parse these characters in this wrapper?Thanks