dkt-projekt / e-SMT

Web service for the Moses Statistical Machine Translation
1 stars 0 forks source link

Character encoding in output #2

Closed PeterBourgonje closed 8 years ago

PeterBourgonje commented 8 years ago

When the output contains diacritics/other usual suspects for character encoding difficulties, it doesn't display properly. e.g. I'm getting the following (both trying through postman and cURL from the command line):

@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<http://freme-project.eu/#char=0,18>
        a               nif:RFC5147String , nif:Context , nif:String ;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "18"^^xsd:nonNegativeInteger ;
        nif:isString    "Präzisionsmeßgerät" ;
        itsrdf:target   "Pr?zisionsme?ger?t \n"@en .

While encoding issues are always a pain to debug (i.e. could be due to my postman's/terminal's encoding settings as well), I suspect that the cause may be the shell script that is calling moses and directly communicating stdout: $EXE -config $DATADIR/moses.ini -input-file $WDIR/input -xml-input exclusive 1>${WDIR}/output 2>${WDIR}/moses.err You could give it a try with mosesserver in combination with xml-rpc to get around having to deal with stdout directly. (This may have the additional bonus that the whole procedure could be faster. Currently sending single lines/words takes a while (every time), when starting an instance of mosesserver initialization only has to be done once)

ankitks commented 8 years ago

I think, the issue is most likely due to local machine encoding settings, and not writing into the file "1>${WDIR}/output"

When I try it, I get the right encoding: @prefix xsd:
http://www.w3.org/2001/XMLSchema# . @prefix itsrdf: http://www.w3.org/2005/11/its/rdf# . @prefix nif:
http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# . http://freme-project.eu/#char=0,18 a nif:RFC5147String , nif:Context , nif:String ; nif:beginIndex "0"^^xsd:nonNegativeInteger ; nif:endIndex "18"^^xsd:nonNegativeInteger ; nif:isString "Präzisionsmeßgerät" ; itsrdf:target "Präzisionsmeßgerät\n"@en .

However, when the system eventually (in the next update) moves to mosesserver and xml-rcp, perhaps this will be resolved as well.