cfmrp / mtool

Software to Manipulate Different Flavors of Semantic Graphs
http://mrp.nlpl.eu
GNU Lesser General Public License v3.0
51 stars 24 forks source link

Fix ‘’ to '' to avoid encoding problems in reporting #43

Closed danielhers closed 5 years ago

danielhers commented 5 years ago

They show as ‘’ in HTML.

oepen commented 5 years ago

truth be told, i would be a bit sad to back out of a perfectly healthy unicode solution, 'just' because some of the pages served by CodaLab end up being displayed (in at least some browsers) with the wrong encoding. it almost seems that the public CodaLab instance (hosted in france, for all i can tell) serves up with an ISO-8859-1 header, while the mtool outputs end up being encoded as UTF-8.

before we give up on unicode in mtool, could we try to force the right header on CodaLab:

https://www.w3.org/International/questions/qa-htaccess-charset

if that failed, we could still try to generate the files in ISO-8859-1 instead, presumably by setting LANG to something like 'en_US.iso88591' in the CodaLab environment that executes the validator?

danielhers commented 5 years ago

I don't think we have access to .htaccess on CodaLab. At least the interface I am aware of is only through the HTML menus, not by terminal access to the server. We could try to replace the docker image used for the validator ("scoring program"), but I think that might be an overkill.

danielhers commented 5 years ago

OK, I added <meta charset="UTF-8"> to the "detailed results" page in the evaluation output (https://github.com/cfmrp/codalab/commit/1413954d71f6ce0c4f3dd3ef5bfb89dab5267d23), so now it shows correctly there at least. The stderr report still seems to use ISO-8859-1?

danielhers commented 5 years ago

I'm just replacing the quotes when printing to stderr (https://github.com/cfmrp/codalab/commit/ac37aff006f2a0daccfc010e47ffe9574c1f20b3).