cloudwu / lua-bson

A BSON library for lua
MIT License
103 stars 32 forks source link

bson.encode报错 #13

Open 253980289 opened 2 years ago

253980289 commented 2 years ago

报错:Invalid utf8 string,不知道对lua里使用的字符串格式有什么要求?

cloudwu commented 2 years ago

这是 bson 规范的要求:string 必须是合法的 utf8 串,不然你需要用 binary 类型。

253980289 commented 2 years ago

我这边对此专门抽取了出问题的数据和代码写了测试代码做测试,发现结果具有随机性,就是说,同样的数据和算法,调用bson.encode有时正常有时报错,我估计还是bson库本身有bug。这些encode的原数据是一个table,将table本身进行bson.encode是必然成功的,但将这个table用lua代码进行一些处理(非二进制处理)转换为字符串类型添加到一个空table里再bson.encode就会出现我上面说的随机性错误了。

253980289 commented 2 years ago

这里例一下我的测试代码:

    local t = {
      ["uid"]  =  100040,
      ["msg"]  =  {
        [1]  =  {
          ["uid_from"]  =  100040,
          ["msg"]  =  "#24#",
          ["service_time"]  =  1666687149,
        },
        [2]  =  {
          ["uid_from"]  =  100051,
          ["msg"]  =  "#24##24#",
          ["service_time"]  =  1666687157,
        },
        [3]  =  {
          ["uid_from"]  =  100051,
          ["msg"]  =  "dfd gdfgcvbcvbcsds安撫大使發順豐是否是的發生的房貸首付士大夫第三方sfsdf",
          ["service_time"]  =  1666689945,
        },
        [4]  =  {
          ["uid_from"]  =  100051,
          ["msg"]  =  "55555",
          ["service_time"]  =  1666690436,
        },
        [5]  =  {
          ["uid_from"]  =  100051,
          ["msg"]  =  "還是 阿萨德较好的打算回到家阿克苏好大开始的喀什阿达哈萨克打算的",
          ["service_time"]  =  1666690450,
        },
        [6]  =  {
          ["uid_from"]  =  100051,
          ["msg"]  =  " local name = user and user:GetName() or self.data",
          ["service_time"]  =  1666690461,
        },
        [7]  =  {
          ["uid_from"]  =  100051,
          ["msg"]  =  "嗯嗯",
          ["service_time"]  =  1666691367,
        },
      },
    }
    print("bson.encode(t)", bson.encode(t))
    local t2 = {k = gdata.gcommon.get_db_action_param_varchar(nndebug.tostring(t))}
    print("bson.encode(t2)", bson.encode(t2))

这里第一个encode每次成功,第二个encode则会随机性报错。

cloudwu commented 2 years ago

此处 gdata.gcommon.get_db_action_param_varchar(nndebug.tostring(t)) 的返回值不可以是 binary 数据,必须是符合 utf-8 规范的字符串。你这里没列出这串数据到底是什么。

你可以用 for p, c in utf8.codes(s) do print(p,c) end 输出出来看看。

如果要处理 binary 数据,需要用 bson.binary(s) 转换。

253980289 commented 2 years ago

大概类似这样的转换格式:

{
      ["name"]  =  "gate:to_client.rsp_get_last_private_msg",
      ["param_float"]  =  1789,
      ["param_varchar"]  =  "{
  [\"uid\"]  =  100040,
  [\"msg\"]  =  {
    [1]  =  {
      [\"uid_from\"]  =  100040,
      [\"msg\"]  =  \"#24#\",
      [\"service_time\"]  =  1666687149,
    },
    [2]  =  {
      [\"uid_from\"]  =  100051,
      [\"msg\"]  =  \"#24##24#\",
      [\"service_time\"]  =  1666687157,
    },
    [3]  =  {
      [\"uid_from\"]  =  100051,
      [\"msg\"]  =  \"dfd gdfgcvbcvbcsds安撫大使發順豐是否是的發生的房貸首付士大夫第三方sfsdf\",
      [\"service_time\"]  =  1666689945,
    },
    [4]  =  {
      [\"uid_from\"]  =  100051,
      [\"msg\"]  =  \"55555\",
      [\"service_time\"]  =  1666690436,
    },
    [5]  =  {
      [\"uid_from\"]  =  100051,
      [\"msg\"]  =  \"還是 阿萨德较好的打算�
...
...
...
450,
    },
    [6]  =  {
      [\"uid_from\"]  =  100051,
      [\"msg\"]  =  \" local name = user and user:GetName() or self.data\",
      [\"service_time\"]  =  1666690461,
    },
    [7]  =  {
      [\"uid_from\"]  =  100051,
      [\"msg\"]  =  \"嗯嗯\",
      [\"service_time\"]  =  1666691367,
    },
  },
}",
      ["_id"]  =  "

我把一个报错的codes记录下来如下:

{   [1] = 123,   [2] = 10,   [3] = 32,   [4] = 32,   [5] = 91,   [6] = 34,   [7] = 117,   [8] = 105,   [9] = 100,   [10] = 34,   [11] = 93,   [12] = 32,   [13] = 32,   [14] = 61,   [15] = 32,   [16] = 32,   [17] = 49,   [18] = 48,   [19] = 48,   [20] = 48,   [21] = 52,   [22] = 48,   [23] = 44,   [24] = 10,   [25] = 32,   [26] = 32,   [27] = 91,   [28] = 34,   [29] = 109,   [30] = 115,   [31] = 103,   [32] = 34,   [33] = 93,   [34] = 32,   [35] = 32,   [36] = 61,   [37] = 32,   [38] = 32,   [39] = 123,   [40] = 10,   [41] = 32,   [42] = 32,   [43] = 32,   [44] = 32,   [45] = 91,   [46] = 55,   [47] = 93,   [48] = 32,   [49] = 32,   [50] = 61,   [51] = 32,   [52] = 32,   [53] = 123,   [54] = 10,   [55] = 32,   [56] = 32,   [57] = 32,   [58] = 32,   [59] = 32,   [60] = 32,   [61] = 91,   [62] = 34,   [63] = 117,   [64] = 105,   [65] = 100,   [66] = 95,   [67] = 102,   [68] = 114,   [69] = 111,   [70] = 109,   [71] = 34,   [72] = 93,   [73] = 32,   [74] = 32,   [75] = 61,   [76] = 32,   [77] = 32,   [78] = 49,   [79] = 48,   [80] = 48,   [81] = 48,   [82] = 53,   [83] = 49,   [84] = 44,   [85] = 10,   [86] = 32,   [87] = 32,   [88] = 32,   [89] = 32,   [90] = 32,   [91] = 32,   [92] = 91,   [93] = 34,   [94] = 109,   [95] = 115,   [96] = 103,   [97] = 34,   [98] = 93,   [99] = 32,   [100] = 32,   [101] = 61,   [102] = 32,   [103] = 32,   [104] = 34,   [105] = 21999,   [108] = 21999,   [111] = 34,   [112] = 44,   [113] = 10,   [114] = 32,   [115] = 32,   [116] = 32,   [117] = 32,   [118] = 32,   [119] = 32,   [120] = 91,   [121] = 34,   [122] = 115,   [123] = 101,   [124] = 114,   [125] = 118,   [126] = 105,   [127] = 99,   [128] = 101,   [129] = 95,   [130] = 116,   [131] = 105,   [132] = 109,   [133] = 101,   [134] = 34,   [135] = 93,   [136] = 32,   [137] = 32,   [138] = 61,   [139] = 32,   [140] = 32,   [141] = 49,   [142] = 54,   [143] = 54,   [144] = 54,   [145] = 54,   [146] = 57,   [147] = 49,   [148] = 51,   [149] = 54,   [150] = 55,   [151] = 44,   [152] = 10,   [153] = 32,   [154] = 32,   [155] = 32,   [156] = 32,   [157] = 125,   [158] = 44,   [159] = 10,   [160] = 32,   [161] = 32,   [162] = 32,   [163] = 32,   [164] = 91,   [165] = 49,   [166] = 93,   [167] = 32,   [168] = 32,   [169] = 61,   [170] = 32,   [171] = 32,   [172] = 123,   [173] = 10,   [174] = 32,   [175] = 32,   [176] = 32,   [177] = 32,   [178] = 32,   [179] = 32,   [180] = 91,   [181] = 34,   [182] = 117,   [183] = 105,   [184] = 100,   [185] = 95,   [186] = 102,   [187] = 114,   [188] = 111,   [189] = 109,   [190] = 34,   [191] = 93,   [192] = 32,   [193] = 32,   [194] = 61,   [195] = 32,   [196] = 32,   [197] = 49,   [198] = 48,   [199] = 48,   [200] = 48,   [201] = 52,   [202] = 48,   [203] = 44,   [204] = 10,   [205] = 32,   [206] = 32,   [207] = 32,   [208] = 32,   [209] = 32,   [210] = 32,   [211] = 91,   [212] = 34,   [213] = 109,   [214] = 115,   [215] = 103,   [216] = 34,   [217] = 93,   [218] = 32,   [219] = 32,   [220] = 61,   [221] = 32,   [222] = 32,   [223] = 34,   [224] = 35,   [225] = 50,   [226] = 52,   [227] = 35,   [228] = 34,   [229] = 44,   [230] = 10,   [231] = 32,   [232] = 32,   [233] = 32,   [234] = 32,   [235] = 32,   [236] = 32,   [237] = 91,   [238] = 34,   [239] = 115,   [240] = 101,   [241] = 114,   [242] = 118,   [243] = 105,   [244] = 99,   [245] = 101,   [246] = 95,   [247] = 116,   [248] = 105,   [249] = 109,   [250] = 101,   [251] = 34,   [252] = 93,   [253] = 32,   [254] = 32,   [255] = 61,   [256] = 32,   [257] = 32,   [258] = 49,   [259] = 54,   [260] = 54,   [261] = 54,   [262] = 54,   [263] = 56,   [264] = 55,   [265] = 49,   [266] = 52,   [267] = 57,   [268] = 44,   [269] = 10,   [270] = 32,   [271] = 32,   [272] = 32,   [273] = 32,   [274] = 125,   [275] = 44,   [276] = 10,   [277] = 32,   [278] = 32,   [279] = 32,   [280] = 32,   [281] = 91,   [282] = 50,   [283] = 93,   [284] = 32,   [285] = 32,   [286] = 61,   [287] = 32,   [288] = 32,   [289] = 123,   [290] = 10,   [291] = 32,   [292] = 32,   [293] = 32,   [294] = 32,   [295] = 32,   [296] = 32,   [297] = 91,   [298] = 34,   [299] = 117,   [300] = 105,   [301] = 100,   [302] = 95,   [303] = 102,   [304] = 114,   [305] = 111,   [306] = 109,   [307] = 34,   [308] = 93,   [309] = 32,   [310] = 32,   [311] = 61,   [312] = 32,   [313] = 32,   [314] = 49,   [315] = 48,   [316] = 48,   [317] = 48,   [318] = 53,   [319] = 49,   [320] = 44,   [321] = 10,   [322] = 32,   [323] = 32,   [324] = 32,   [325] = 32,   [326] = 32,   [327] = 32,   [328] = 91,   [329] = 34,   [330] = 109,   [331] = 115,   [332] = 103,   [333] = 34,   [334] = 93,   [335] = 32,   [336] = 32,   [337] = 61,   [338] = 32,   [339] = 32,   [340] = 34,   [341] = 35,   [342] = 50,   [343] = 52,   [344] = 35,   [345] = 35,   [346] = 50,   [347] = 52,   [348] = 35,   [349] = 34,   [350] = 44,   [351] = 10,   [352] = 32,   [353] = 32,   [354] = 32,   [355] = 32,   [356] = 32,   [357] = 32,   [358] = 91,   [359] = 34,   [360] = 115,   [361] = 101,   [362] = 114,   [363] = 118,   [364] = 105,   [365] = 99,   [366] = 101,   [367] = 95,   [368] = 116,   [369] = 105,   [370] = 109,   [371] = 101,   [372] = 34,   [373] = 93,   [374] = 32,   [375] = 32,   [376] = 61,   [377] = 32,   [378] = 32,   [379] = 49,   [380] = 54,   [381] = 54,   [382] = 54,   [383] = 54,   [384] = 56,   [385] = 55,   [386] = 49,   [387] = 53,   [388] = 55,   [389] = 44,   [390] = 10,   [391] = 32,   [392] = 32,   [393] = 32,   [394] = 32,   [395] = 125,   [396] = 44,   [397] = 10,   [398] = 32,   [399] = 32,   [400] = 32,   [401] = 32,   [402] = 91,   [403] = 51,   [404] = 93,   [405] = 32,   [406] = 32,   [407] = 61,   [408] = 32,   [409] = 32,   [410] = 123,   [411] = 10,   [412] = 32,   [413] = 32,   [414] = 32,   [415] = 32,   [416] = 32,   [417] = 32,   [418] = 91,   [419] = 34,   [420] = 117,   [421] = 105,   [422] = 100,   [423] = 95,   [424] = 102,   [425] = 114,   [426] = 111,   [427] = 109,   [428] = 34,   [429] = 93,   [430] = 32,   [431] = 32,   [432] = 61,   [433] = 32,   [434] = 32,   [435] = 49,   [436] = 48,   [437] = 48,   [438] = 48,   [439] = 53,   [440] = 49,   [441] = 44,   [442] = 10,   [443] = 32,   [444] = 32,   [445] = 32,   [446] = 32,   [447] = 32,   [448] = 32,   [449] = 91,   [450] = 34,   [451] = 109,   [452] = 115,   [453] = 103,   [454] = 34,   [455] = 93,   [456] = 32,   [457] = 32,   [458] = 61,   [459] = 32,   [460] = 32,   [461] = 34,   [462] = 100,   [463] = 102,   [464] = 100,   [465] = 32,   [466] = 103,   [467] = 100,   [468] = 102,   [469] = 103,   [470] = 99,   [471] = 118,   [472] = 98,   [473] = 99,   [474] = 118,   [475] = 98,   [476] = 99,   [477] = 115,   [478] = 100,   [479] = 115,   [480] = 23433,   [483] = 25771,   [486] = 22823,   [489] = 20351,   [492] = 30332,   [495] = 38918,   [498] = 35920,   [501] = 26159,   [504] = 21542,   [507] = 26159,   [510] = 30340,   [513] = 30332,   [516] = 29983,   [519] = 30340,   [522] = 25151,   [525] = 36024,   [528] = 39318,   [531] = 20184,   [534] = 22763,   [537] = 22823,   [540] = 22827,   [543] = 31532,   [546] = 19977,   [549] = 26041,   [552] = 115,   [553] = 102,   [554] = 115,   [555] = 100,   [556] = 102,   [557] = 34,   [558] = 44,   [559] = 10,   [560] = 32,   [561] = 32,   [562] = 32,   [563] = 32,   [564] = 32,   [565] = 32,   [566] = 91,   [567] = 34,   [568] = 115,   [569] = 101,   [570] = 114,   [571] = 118,   [572] = 105,   [573] = 99,   [574] = 101,   [575] = 95,   [576] = 116,   [577] = 105,   [578] = 109,   [579] = 101,   [580] = 34,   [581] = 93,   [582] = 32,   [583] = 32,   [584] = 61,   [585] = 32,   [586] = 32,   [587] = 49,   [588] = 54,   [589] = 54,   [590] = 54,   [591] = 54,   [592] = 56,   [593] = 57,   [594] = 57,   [595] = 52,   [596] = 53,   [597] = 44,   [598] = 10,   [599] = 32,   [600] = 32,   [601] = 32,   [602] = 32,   [603] = 125,   [604] = 44,   [605] = 10,   [606] = 32,   [607] = 32,   [608] = 32,   [609] = 32,   [610] = 91,   [611] = 52,   [612] = 93,   [613] = 32,   [614] = 32,   [615] = 61,   [616] = 32,   [617] = 32,   [618] = 123,   [619] = 10,   [620] = 32,   [621] = 32,   [622] = 32,   [623] = 32,   [624] = 32,   [625] = 32,   [626] = 91,   [627] = 34,   [628] = 117,   [629] = 105,   [630] = 100,   [631] = 95,   [632] = 102,   [633] = 114,   [634] = 111,   [635] = 109,   [636] = 34,   [637] = 93,   [638] = 32,   [639] = 32,   [640] = 61,   [641] = 32,   [642] = 32,   [643] = 49,   [644] = 48,   [645] = 48,   [646] = 48,   [647] = 53,   [648] = 49,   [649] = 44,   [650] = 10,   [651] = 32,   [652] = 32,   [653] = 32,   [654] = 32,   [655] = 32,   [656] = 32,   [657] = 91,   [658] = 34,   [659] = 109,   [660] = 115,   [661] = 103,   [662] = 34,   [663] = 93,   [664] = 32,   [665] = 32,   [666] = 61,   [667] = 32,   [668] = 32,   [669] = 34,   [670] = 53,   [671] = 10,   [672] = 46,   [673] = 46,   [674] = 46,   [675] = 10,   [676] = 46,   [677] = 46,   [678] = 46,   [679] = 10,   [680] = 46,   [681] = 46,   [682] = 46,   [683] = 10,   [685] = 24503,   [688] = 36739,   [691] = 22909,   [694] = 30340,   [697] = 25171,   [700] = 31639,   [703] = 22238,   [706] = 21040,   [709] = 23478,   [712] = 38463,   [715] = 20811,   [718] = 33487,   [721] = 22909,   [724] = 22823,   [727] = 24320,   [730] = 22987,   [733] = 30340,   [736] = 21888,   [739] = 20160,   [742] = 38463,   [745] = 36798,   [748] = 21704,   [751] = 33832,   [754] = 20811,   [757] = 25171,   [760] = 31639,   [763] = 30340,   [766] = 34,   [767] = 44,   [768] = 10,   [769] = 32,   [770] = 32,   [771] = 32,   [772] = 32,   [773] = 32,   [774] = 32,   [775] = 91,   [776] = 34,   [777] = 115,   [778] = 101,   [779] = 114,   [780] = 118,   [781] = 105,   [782] = 99,   [783] = 101,   [784] = 95,   [785] = 116,   [786] = 105,   [787] = 109,   [788] = 101,   [789] = 34,   [790] = 93,   [791] = 32,   [792] = 32,   [793] = 61,   [794] = 32,   [795] = 32,   [796] = 49,   [797] = 54,   [798] = 54,   [799] = 54,   [800] = 54,   [801] = 57,   [802] = 48,   [803] = 52,   [804] = 53,   [805] = 48,   [806] = 44,   [807] = 10,   [808] = 32,   [809] = 32,   [810] = 32,   [811] = 32,   [812] = 125,   [813] = 44,   [814] = 10,   [815] = 32,   [816] = 32,   [817] = 32,   [818] = 32,   [819] = 91,   [820] = 54,   [821] = 93,   [822] = 32,   [823] = 32,   [824] = 61,   [825] = 32,   [826] = 32,   [827] = 123,   [828] = 10,   [829] = 32,   [830] = 32,   [831] = 32,   [832] = 32,   [833] = 32,   [834] = 32,   [835] = 91,   [836] = 34,   [837] = 117,   [838] = 105,   [839] = 100,   [840] = 95,   [841] = 102,   [842] = 114,   [843] = 111,   [844] = 109,   [845] = 34,   [846] = 93,   [847] = 32,   [848] = 32,   [849] = 61,   [850] = 32,   [851] = 32,   [852] = 49,   [853] = 48,   [854] = 48,   [855] = 48,   [856] = 53,   [857] = 49,   [858] = 44,   [859] = 10,   [860] = 32,   [861] = 32,   [862] = 32,   [863] = 32,   [864] = 32,   [865] = 32,   [866] = 91,   [867] = 34,   [868] = 109,   [869] = 115,   [870] = 103,   [871] = 34,   [872] = 93,   [873] = 32,   [874] = 32,   [875] = 61,   [876] = 32,   [877] = 32,   [878] = 34,   [879] = 32,   [880] = 108,   [881] = 111,   [882] = 99,   [883] = 97,   [884] = 108,   [885] = 32,   [886] = 110,   [887] = 97,   [888] = 109,   [889] = 101,   [890] = 32,   [891] = 61,   [892] = 32,   [893] = 117,   [894] = 115,   [895] = 101,   [896] = 114,   [897] = 32,   [898] = 97,   [899] = 110,   [900] = 100,   [901] = 32,   [902] = 117,   [903] = 115,   [904] = 101,   [905] = 114,   [906] = 58,   [907] = 71,   [908] = 101,   [909] = 116,   [910] = 78,   [911] = 97,   [912] = 109,   [913] = 101,   [914] = 40,   [915] = 41,   [916] = 32,   [917] = 111,   [918] = 114,   [919] = 32,   [920] = 115,   [921] = 101,   [922] = 108,   [923] = 102,   [924] = 46,   [925] = 100,   [926] = 97,   [927] = 116,   [928] = 97,   [929] = 34,   [930] = 44,   [931] = 10,   [932] = 32,   [933] = 32,   [934] = 32,   [935] = 32,   [936] = 32,   [937] = 32,   [938] = 91,   [939] = 34,   [940] = 115,   [941] = 101,   [942] = 114,   [943] = 118,   [944] = 105,   [945] = 99,   [946] = 101,   [947] = 95,   [948] = 116,   [949] = 105,   [950] = 109,   [951] = 101,   [952] = 34,   [953] = 93,   [954] = 32,   [955] = 32,   [956] = 61,   [957] = 32,   [958] = 32,   [959] = 49,   [960] = 54,   [961] = 54,   [962] = 54,   [963] = 54,   [964] = 57,   [965] = 48,   [966] = 52,   [967] = 54,   [968] = 49,   [969] = 44,   [970] = 10,   [971] = 32,   [972] = 32,   [973] = 32,   [974] = 32,   [975] = 125,   [976] = 44,   [977] = 10,   [978] = 32,   [979] = 32,   [980] = 125,   [981] = 44,   [982] = 10,   [983] = 125,  }
253980289 commented 2 years ago

另外把另一个报错的数据的二进制格式显示出来了 err

253980289 commented 2 years ago

初步判断应该是我在转换函数里调用了string.sub函数进行拼接,导致完整的utf8格式被破坏了导致。

cloudwu commented 2 years ago

你可以用 utf8.len 检查是否是合法的 utf-8 串。

253980289 commented 2 years ago

非常感谢云大的大力支持,目前问题已经解决,问题原因已经基本确定是string.sub后字节重新拼接导致打破了正常的utf8字节序。

为解决此问题重新实现了两个函数,这里列出,方便有碰到同类问题的同学参考:

-- 以字符为单位获取子串(区别于string.sub按字节为单位)
function nnstring.sub_chat(s, i, j)
    i = i or 1
    local ii = utf8.offset(s, i)
    local jj = j and (utf8.offset(s, j + 1) - 1) or #s
    local ret = string.sub(s, ii, jj)
    return ret
end
--[[
print(nnstring.sub_chat("中国afd人", 1, 2))
-- >>中国
print(nnstring.sub_chat("中国afd人", 2, 4))
-- >>国af
print(nnstring.sub_chat("中国afd人", 2))
-- >>国afd人
print(nnstring.sub_chat("中国afd人", -2))
-- >>d人
]]

-- 按字符为单位切割字符串,保留头和尾,切割中间部分,一般用于日志输出
function nnstring.cut(s, max_len, ellipsis, head_len, tail_len)
    ellipsis = ellipsis or "..." -- string.rep("\r\n" .. "...", 3) .. "\r\n"
    local len = utf8.len(s)
    if (len > max_len) then
        head_len = head_len or math.floor(max_len * 3 / 4)
        tail_len = tail_len or (max_len - head_len - utf8.len(ellipsis))
        if tail_len <= 0 then
            tail_len = 3
            head_len = head_len - tail_len
        end
        s = nnstring.sub_chat(s, 1, head_len)
            .. ellipsis
            .. nnstring.sub_chat(s, -tail_len)
        -- print(s, utf8.len(s), max_len)
        assert(utf8.len(s) <= max_len, utf8.len(s))
    end
    return s
end
-- print(nnstring.cut("中华人民共和国中华人民共和国", 10)) --, ellipsis, head_len, tail_len))
-- -->>中华人民...共和国
253980289 commented 2 years ago

另外还有个小问题是,为啥被打破的字符串字调用for utf8.codes时不会报错,按lua官方文档说应该要报错的:

utf8.codes (s [, lax])
Returns values so that the construction

     for p, c in utf8.codes(s) do body end
will iterate over all UTF-8 characters in string s, with p being the position (in bytes) and c the code point of each character. It raises an error if it meets any invalid byte sequence.

It raises an error if it meets any invalid byte sequence.