Closed PeterBourgonje closed 8 years ago
I think, the issue is most likely due to local machine encoding settings, and not writing into the file "1>${WDIR}/output"
When I try it, I get the right encoding:
@prefix xsd:
http://www.w3.org/2001/XMLSchema# .
@prefix itsrdf:
http://www.w3.org/2005/11/its/rdf# .
@prefix nif:
http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# .
http://freme-project.eu/#char=0,18
a nif:RFC5147String , nif:Context , nif:String ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "18"^^xsd:nonNegativeInteger ;
nif:isString "Präzisionsmeßgerät" ;
itsrdf:target "Präzisionsmeßgerät\n"@en .
However, when the system eventually (in the next update) moves to mosesserver and xml-rcp, perhaps this will be resolved as well.
When the output contains diacritics/other usual suspects for character encoding difficulties, it doesn't display properly. e.g. I'm getting the following (both trying through postman and cURL from the command line):
While encoding issues are always a pain to debug (i.e. could be due to my postman's/terminal's encoding settings as well), I suspect that the cause may be the shell script that is calling moses and directly communicating stdout:
$EXE -config $DATADIR/moses.ini -input-file $WDIR/input -xml-input exclusive 1>${WDIR}/output 2>${WDIR}/moses.err
You could give it a try with mosesserver in combination with xml-rpc to get around having to deal with stdout directly. (This may have the additional bonus that the whole procedure could be faster. Currently sending single lines/words takes a while (every time), when starting an instance of mosesserver initialization only has to be done once)