computerline1z / okapi

Automatically exported from code.google.com/p/okapi
0 stars 0 forks source link

ASCII chars not un-escaped with Encoding Conversion Step #318

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Some ASCII characters seem to not been un-escaped by the Encoding Conversion 
step when going to UTF-8

See http://tech.groups.yahoo.com/group/okapitools/message/3579

Original issue reported on code.google.com by yves.sav...@gmail.com on 5 Mar 2013 at 4:42

GoogleCodeExporter commented 9 years ago
It seems the conversion code skips the ASCII characters:

if ( value < 128 ) {
   // Unknown pattern or ASCII values: Keep it as it
   // (so <, &, ", etc.. stay escaped)
   tmp.append(m.group());
}

I guess we were playing it safe.
But we can be more specific and preserve only '<', '&', '>', ''' and '"' and 
convert all other ASCII.

Original comment by yves.sav...@gmail.com on 17 Aug 2013 at 1:29

GoogleCodeExporter commented 9 years ago
This was closed at https://code.google.com/p/okapi/source/detail?r=a13b57d25796

Original comment by yves.sav...@gmail.com on 17 Aug 2013 at 2:21