clugg / sm-json

A pure SourcePawn JSON encoder/decoder.
GNU General Public License v3.0
82 stars 8 forks source link

[Bug] json encoding utf-8 fails #17

Closed FAUSheppy closed 4 years ago

FAUSheppy commented 4 years ago

Description

Encoding the following string produces a bad json/ascii-encoded utf-8 string.

JSON_Object player = new JSON_Object();
player.SetString("name", "Отключился.")
char output[2048];
obj.Encode(output, sizeof(output));
// output: { "name" : "\u041e\u0442\u043a\u043b\u044e\u04383b\u044 } (note the lack of parentheses)
FAUSheppy commented 4 years ago

Seemed to me like the array into which the encoded string is copied was too short, I'm gonna be honest with you I don't quite understand your ascii-length calculations, but if i do this:

diff --git a/scripting/include/json.inc b/scripting/include/json.inc
index ad66406..2315b10 100644
--- a/scripting/include/json.inc
+++ b/scripting/include/json.inc
@@ -120,7 +120,7 @@ stock void json_encode(
         cell_length = 0;
         switch (type) {
             case JSON_Type_String: {
-                str_length = obj.GetKeyLength(key);
+                str_length = obj.GetKeyLength(key) +100;
                 cell_length = json_cell_string_size(str_length);
             }
             case JSON_Type_Int: {

it works.

clugg commented 4 years ago

Hey there! Thanks for this bug report. At a glance I'd hazard a guess that this is an issue with json_cell_string_size. Since UTF-8 characters can be anywhere from 2 to 4 bytes, and are always escaped to 6 (\uXXXX), I believe that any 2 byte UTF-8 characters will result in the buffer being too short to store the escaped version. This was my mistake - when writing the tests I simply grabbed a few UTF-8 symbols and called it a day - none of which were 2 bytes, apparently.

I will fix this in the next few days by making json_cell_string_size iterate strings and properly calculate the size, but as a temporary fix you could change length * 2 to length * 3 on line 243 of addons/sourcemod/scripting/include/json/helpers/string.inc and it should work in every case.

clugg commented 4 years ago

I have just released v3.1.1, which fixes the issue you were facing, as well as a few other minor changes/optimisations. I ended up using the hotfix described in my previous comment, as the alternative solution will require a fair amount of changes to how strings are handled in general. I will keep the alternative solution in mind for a future release, as it will certainly be more memory conscious, but I hope this is suitable for the time being. Thanks again!