Characters outside the Unicode Basic Multilingual Plane are not shown correctly in outputs

GoogleCodeExporter commented 9 years ago

We currently escape non-ASCII characters in log, report, and other outputs 
using format `\uXXXX`. This works fine with characters on the Basic 
Multilingual Plane (BMP), but characters outside of it are escaped incorrectly.

This isn't a problem with any spoken language, but there are special symbols 
etc. that cannot be shown correctly. This is a problem if such symbols are 
needed in testing. We noticed this issue when testing how characters written in 
the test data using the new `\Uxxxxxxxx` escape mechanism (issue 1524) look 
like in the browser.

It seems that the easiest solution to this issue is that instead of trying to 
escape all non-ASCII characters, we simply encode all the text written to 
outputs as UTF-8. In addition to fixing the problem with the characters outside 
the BMP, this approach seems to have better performance and create smaller 
outputs than the old one. How big these other enhancements are needs to be 
tested separately. We also need to test does the new approach work with all our 
supported browsers in general.

Original issue reported on code.google.com by pekka.klarck on 19 Sep 2013 at 9:17

GoogleCodeExporter commented 9 years ago

This issue was updated by revision 41ff06999c8a.

Implemented and updated affected tests.

Original comment by robotframework@gmail.com on 19 Sep 2013 at 9:20

Changed state: Started

GoogleCodeExporter commented 9 years ago

This issue was updated by revision b54d66290d54.

Original comment by jussi.ao...@gmail.com on 19 Sep 2013 at 1:35

GoogleCodeExporter commented 9 years ago

After fixing this problem with with normal strings, we noticed that strings 
that were so long that they were zipped were not decoded correctly. This 
problem was caused by a bug in JXG.Util.utf8Decode [1] and Jussi's commit above 
fixed it with an alternative UTF-8 decode approach. Unfortunately that solution 
turned out to be slower than the old one in general and unacceptable slow with 
IE8.

[1] https://github.com/jsxgraph/jsxgraph/issues/50

Luckily JXG team was interested in fixing the issue in their library. I found a 
UTF-8 decoding algorithm that works also with characters outside the BMP and it 
was then translated to Javascript. Especially good news is that at least with 
modern browsers the new algorithm is a bit faster than the old one.

The final steps in fixing this issue is updating the jsxcompressor.js library 
we use with a fixed version. Unfortunately I couldn't find such version 
anywhere yet but I asked [2] is it available or should I generate it myself.

[2] https://github.com/jsxgraph/jsxgraph/issues/50#issuecomment-26122539

Original comment by pekka.klarck on 11 Oct 2013 at 8:54

GoogleCodeExporter commented 9 years ago

This issue was updated by revision 4606e35f2209.

Added non-BMP chars to report/log manual test data. Used both short strings and 
strings long enough to be zipped.

Original comment by pekka.klarck on 11 Oct 2013 at 8:35

GoogleCodeExporter commented 9 years ago

This issue was updated by revision 9ae6dba3361f.

Updated JSXCompressor and took its UTF8.decode method into use. Test data with 
un-zipped and zipped snowmans and monkey faces looks good.

Original comment by pekka.klarck on 11 Oct 2013 at 8:35

Changed state: Done

GoogleCodeExporter commented 9 years ago

Original comment by pekka.klarck on 26 Nov 2013 at 1:45

Added labels: Priority-Low
Removed labels: Priority-Medium

fiuba08 / robotframework

Characters outside the Unicode Basic Multilingual Plane are not shown correctly in outputs #1526