codewars / codewars.com

Issue tracker for Codewars
https://www.codewars.com
BSD 2-Clause "Simplified" License
2.09k stars 218 forks source link

UTF-16 characters displayed by console.log is not encoded properly #902

Closed Voileexperiments closed 7 years ago

Voileexperiments commented 7 years ago

I'm writing a JS kata that requires displaying Unicode characters to console.log, and it seems that out of about magic/100 times or something, the result printed is not encoded properly:

Expected (happens in (100-magic)/100 times): image

Actual (happens in magic/100 times): image

EDIT: The kata I was writing is now alive, so you can check it for yourself: https://www.codewars.com/kata/xiang-qi-xiangqi-slash-chinese-chess-board-validator

Voileexperiments commented 7 years ago

So I'm now attempting the new Check and Mate? kata, and it seems that this problem affects even chess pieces as well:

Expected (happens most of the time): image

Actual (happens sometimes): image

My educated guess (haven't actually checked encoding conversion for those garbled text, but I've encountered enough of them during my life) is that what happens here is the complementary behaviour of the above problem. Maybe apparently a small portion of the AWS servers are running in a different locale, and hence has different encoding from the others (possibly even at China)?

In any case, there are probably some implicit encoding happening somewhere in the chain.

(By the way, @myjinxin2015 suggested to me yesterday to use codePointAt to deal with the problem:

const form = s => [...s].map(c=>`&#${c.codePointAt(0)};`).join(""); //maps char to char.codePointAt(0)

usage:

s='車馬象士將士象馬車'
console.log(s)  VS  console.log(form(s))

which work perfectly, however it doesn't quite solve the mystery: Why is it needed to reparse the UTF-16 chars as codepoints again? And how did the text come out in a wrong encoding when it goes wrong?)

Voileexperiments commented 7 years ago

Another thing I noticed: The console output is fine when the tests are still running and streaming console output (it's on by default in kata editor), but only become garbled after the tests are completed (or killed by SIGKILL or whatever) and results are concluded.

I think this suggests that the problem comes from the final encoding of the results page.

kazk commented 7 years ago

This has been happening for a while. I suggested myjinxin to use that method.

I thought this was classic Mojibake (文字化け), but I've seen it randomly display correctly if I submit multiple times and I don't know why is that.

@jhoffner, is Codewars/output.codewars still used? I don't know how it's used, but public/index.html lacks <meta charset="utf-8" /> which might be related.

kazk commented 7 years ago

<meta charset="utf-8" /> fixes this for Codewars/output.codewars.

git clone https://github.com/Codewars/output.codewars
cd output.codewars

rm Gemfile.lock # failed to install eventmachine-1.0.3 with Ruby 2.4
bundle install
bundle exec rackup -p 3000 config.ru

open http://localhost:3000

Adding

$(function() {
  $('body').css('background-color', '#222').html('車馬象士將士象馬車');
});

in public/js/app.js produces.

image

Adding <meta charset="utf-8" /> in public/index.html.

image


Setting 'Content-Type' => 'text/html; charset=utf-8' in config.ru also fixes this without having <meta charset="utf-8" />. I think it's recommended to have both.

jhoffner commented 7 years ago

Output.codewars is no longer used. However I'm rolling out a rebuilt output UI (vuejs based) tomorrow so that may fix things.

jhoffner commented 7 years ago

The UI is released but this problem is not resolved :( utf-8 is the default charset of the page. I'll try to look into this more this week, it might be coming back wrong from the CLI.