Open 253980289 opened 2 years ago
这是 bson 规范的要求:string 必须是合法的 utf8 串,不然你需要用 binary 类型。
我这边对此专门抽取了出问题的数据和代码写了测试代码做测试,发现结果具有随机性,就是说,同样的数据和算法,调用bson.encode有时正常有时报错,我估计还是bson库本身有bug。这些encode的原数据是一个table,将table本身进行bson.encode是必然成功的,但将这个table用lua代码进行一些处理(非二进制处理)转换为字符串类型添加到一个空table里再bson.encode就会出现我上面说的随机性错误了。
这里例一下我的测试代码:
local t = {
["uid"] = 100040,
["msg"] = {
[1] = {
["uid_from"] = 100040,
["msg"] = "#24#",
["service_time"] = 1666687149,
},
[2] = {
["uid_from"] = 100051,
["msg"] = "#24##24#",
["service_time"] = 1666687157,
},
[3] = {
["uid_from"] = 100051,
["msg"] = "dfd gdfgcvbcvbcsds安撫大使發順豐是否是的發生的房貸首付士大夫第三方sfsdf",
["service_time"] = 1666689945,
},
[4] = {
["uid_from"] = 100051,
["msg"] = "55555",
["service_time"] = 1666690436,
},
[5] = {
["uid_from"] = 100051,
["msg"] = "還是 阿萨德较好的打算回到家阿克苏好大开始的喀什阿达哈萨克打算的",
["service_time"] = 1666690450,
},
[6] = {
["uid_from"] = 100051,
["msg"] = " local name = user and user:GetName() or self.data",
["service_time"] = 1666690461,
},
[7] = {
["uid_from"] = 100051,
["msg"] = "嗯嗯",
["service_time"] = 1666691367,
},
},
}
print("bson.encode(t)", bson.encode(t))
local t2 = {k = gdata.gcommon.get_db_action_param_varchar(nndebug.tostring(t))}
print("bson.encode(t2)", bson.encode(t2))
这里第一个encode每次成功,第二个encode则会随机性报错。
此处 gdata.gcommon.get_db_action_param_varchar(nndebug.tostring(t))
的返回值不可以是 binary 数据,必须是符合 utf-8 规范的字符串。你这里没列出这串数据到底是什么。
你可以用 for p, c in utf8.codes(s) do print(p,c) end
输出出来看看。
如果要处理 binary 数据,需要用 bson.binary(s) 转换。
大概类似这样的转换格式:
{
["name"] = "gate:to_client.rsp_get_last_private_msg",
["param_float"] = 1789,
["param_varchar"] = "{
[\"uid\"] = 100040,
[\"msg\"] = {
[1] = {
[\"uid_from\"] = 100040,
[\"msg\"] = \"#24#\",
[\"service_time\"] = 1666687149,
},
[2] = {
[\"uid_from\"] = 100051,
[\"msg\"] = \"#24##24#\",
[\"service_time\"] = 1666687157,
},
[3] = {
[\"uid_from\"] = 100051,
[\"msg\"] = \"dfd gdfgcvbcvbcsds安撫大使發順豐是否是的發生的房貸首付士大夫第三方sfsdf\",
[\"service_time\"] = 1666689945,
},
[4] = {
[\"uid_from\"] = 100051,
[\"msg\"] = \"55555\",
[\"service_time\"] = 1666690436,
},
[5] = {
[\"uid_from\"] = 100051,
[\"msg\"] = \"還是 阿萨德较好的打算�
...
...
...
450,
},
[6] = {
[\"uid_from\"] = 100051,
[\"msg\"] = \" local name = user and user:GetName() or self.data\",
[\"service_time\"] = 1666690461,
},
[7] = {
[\"uid_from\"] = 100051,
[\"msg\"] = \"嗯嗯\",
[\"service_time\"] = 1666691367,
},
},
}",
["_id"] = "
我把一个报错的codes记录下来如下:
{ [1] = 123, [2] = 10, [3] = 32, [4] = 32, [5] = 91, [6] = 34, [7] = 117, [8] = 105, [9] = 100, [10] = 34, [11] = 93, [12] = 32, [13] = 32, [14] = 61, [15] = 32, [16] = 32, [17] = 49, [18] = 48, [19] = 48, [20] = 48, [21] = 52, [22] = 48, [23] = 44, [24] = 10, [25] = 32, [26] = 32, [27] = 91, [28] = 34, [29] = 109, [30] = 115, [31] = 103, [32] = 34, [33] = 93, [34] = 32, [35] = 32, [36] = 61, [37] = 32, [38] = 32, [39] = 123, [40] = 10, [41] = 32, [42] = 32, [43] = 32, [44] = 32, [45] = 91, [46] = 55, [47] = 93, [48] = 32, [49] = 32, [50] = 61, [51] = 32, [52] = 32, [53] = 123, [54] = 10, [55] = 32, [56] = 32, [57] = 32, [58] = 32, [59] = 32, [60] = 32, [61] = 91, [62] = 34, [63] = 117, [64] = 105, [65] = 100, [66] = 95, [67] = 102, [68] = 114, [69] = 111, [70] = 109, [71] = 34, [72] = 93, [73] = 32, [74] = 32, [75] = 61, [76] = 32, [77] = 32, [78] = 49, [79] = 48, [80] = 48, [81] = 48, [82] = 53, [83] = 49, [84] = 44, [85] = 10, [86] = 32, [87] = 32, [88] = 32, [89] = 32, [90] = 32, [91] = 32, [92] = 91, [93] = 34, [94] = 109, [95] = 115, [96] = 103, [97] = 34, [98] = 93, [99] = 32, [100] = 32, [101] = 61, [102] = 32, [103] = 32, [104] = 34, [105] = 21999, [108] = 21999, [111] = 34, [112] = 44, [113] = 10, [114] = 32, [115] = 32, [116] = 32, [117] = 32, [118] = 32, [119] = 32, [120] = 91, [121] = 34, [122] = 115, [123] = 101, [124] = 114, [125] = 118, [126] = 105, [127] = 99, [128] = 101, [129] = 95, [130] = 116, [131] = 105, [132] = 109, [133] = 101, [134] = 34, [135] = 93, [136] = 32, [137] = 32, [138] = 61, [139] = 32, [140] = 32, [141] = 49, [142] = 54, [143] = 54, [144] = 54, [145] = 54, [146] = 57, [147] = 49, [148] = 51, [149] = 54, [150] = 55, [151] = 44, [152] = 10, [153] = 32, [154] = 32, [155] = 32, [156] = 32, [157] = 125, [158] = 44, [159] = 10, [160] = 32, [161] = 32, [162] = 32, [163] = 32, [164] = 91, [165] = 49, [166] = 93, [167] = 32, [168] = 32, [169] = 61, [170] = 32, [171] = 32, [172] = 123, [173] = 10, [174] = 32, [175] = 32, [176] = 32, [177] = 32, [178] = 32, [179] = 32, [180] = 91, [181] = 34, [182] = 117, [183] = 105, [184] = 100, [185] = 95, [186] = 102, [187] = 114, [188] = 111, [189] = 109, [190] = 34, [191] = 93, [192] = 32, [193] = 32, [194] = 61, [195] = 32, [196] = 32, [197] = 49, [198] = 48, [199] = 48, [200] = 48, [201] = 52, [202] = 48, [203] = 44, [204] = 10, [205] = 32, [206] = 32, [207] = 32, [208] = 32, [209] = 32, [210] = 32, [211] = 91, [212] = 34, [213] = 109, [214] = 115, [215] = 103, [216] = 34, [217] = 93, [218] = 32, [219] = 32, [220] = 61, [221] = 32, [222] = 32, [223] = 34, [224] = 35, [225] = 50, [226] = 52, [227] = 35, [228] = 34, [229] = 44, [230] = 10, [231] = 32, [232] = 32, [233] = 32, [234] = 32, [235] = 32, [236] = 32, [237] = 91, [238] = 34, [239] = 115, [240] = 101, [241] = 114, [242] = 118, [243] = 105, [244] = 99, [245] = 101, [246] = 95, [247] = 116, [248] = 105, [249] = 109, [250] = 101, [251] = 34, [252] = 93, [253] = 32, [254] = 32, [255] = 61, [256] = 32, [257] = 32, [258] = 49, [259] = 54, [260] = 54, [261] = 54, [262] = 54, [263] = 56, [264] = 55, [265] = 49, [266] = 52, [267] = 57, [268] = 44, [269] = 10, [270] = 32, [271] = 32, [272] = 32, [273] = 32, [274] = 125, [275] = 44, [276] = 10, [277] = 32, [278] = 32, [279] = 32, [280] = 32, [281] = 91, [282] = 50, [283] = 93, [284] = 32, [285] = 32, [286] = 61, [287] = 32, [288] = 32, [289] = 123, [290] = 10, [291] = 32, [292] = 32, [293] = 32, [294] = 32, [295] = 32, [296] = 32, [297] = 91, [298] = 34, [299] = 117, [300] = 105, [301] = 100, [302] = 95, [303] = 102, [304] = 114, [305] = 111, [306] = 109, [307] = 34, [308] = 93, [309] = 32, [310] = 32, [311] = 61, [312] = 32, [313] = 32, [314] = 49, [315] = 48, [316] = 48, [317] = 48, [318] = 53, [319] = 49, [320] = 44, [321] = 10, [322] = 32, [323] = 32, [324] = 32, [325] = 32, [326] = 32, [327] = 32, [328] = 91, [329] = 34, [330] = 109, [331] = 115, [332] = 103, [333] = 34, [334] = 93, [335] = 32, [336] = 32, [337] = 61, [338] = 32, [339] = 32, [340] = 34, [341] = 35, [342] = 50, [343] = 52, [344] = 35, [345] = 35, [346] = 50, [347] = 52, [348] = 35, [349] = 34, [350] = 44, [351] = 10, [352] = 32, [353] = 32, [354] = 32, [355] = 32, [356] = 32, [357] = 32, [358] = 91, [359] = 34, [360] = 115, [361] = 101, [362] = 114, [363] = 118, [364] = 105, [365] = 99, [366] = 101, [367] = 95, [368] = 116, [369] = 105, [370] = 109, [371] = 101, [372] = 34, [373] = 93, [374] = 32, [375] = 32, [376] = 61, [377] = 32, [378] = 32, [379] = 49, [380] = 54, [381] = 54, [382] = 54, [383] = 54, [384] = 56, [385] = 55, [386] = 49, [387] = 53, [388] = 55, [389] = 44, [390] = 10, [391] = 32, [392] = 32, [393] = 32, [394] = 32, [395] = 125, [396] = 44, [397] = 10, [398] = 32, [399] = 32, [400] = 32, [401] = 32, [402] = 91, [403] = 51, [404] = 93, [405] = 32, [406] = 32, [407] = 61, [408] = 32, [409] = 32, [410] = 123, [411] = 10, [412] = 32, [413] = 32, [414] = 32, [415] = 32, [416] = 32, [417] = 32, [418] = 91, [419] = 34, [420] = 117, [421] = 105, [422] = 100, [423] = 95, [424] = 102, [425] = 114, [426] = 111, [427] = 109, [428] = 34, [429] = 93, [430] = 32, [431] = 32, [432] = 61, [433] = 32, [434] = 32, [435] = 49, [436] = 48, [437] = 48, [438] = 48, [439] = 53, [440] = 49, [441] = 44, [442] = 10, [443] = 32, [444] = 32, [445] = 32, [446] = 32, [447] = 32, [448] = 32, [449] = 91, [450] = 34, [451] = 109, [452] = 115, [453] = 103, [454] = 34, [455] = 93, [456] = 32, [457] = 32, [458] = 61, [459] = 32, [460] = 32, [461] = 34, [462] = 100, [463] = 102, [464] = 100, [465] = 32, [466] = 103, [467] = 100, [468] = 102, [469] = 103, [470] = 99, [471] = 118, [472] = 98, [473] = 99, [474] = 118, [475] = 98, [476] = 99, [477] = 115, [478] = 100, [479] = 115, [480] = 23433, [483] = 25771, [486] = 22823, [489] = 20351, [492] = 30332, [495] = 38918, [498] = 35920, [501] = 26159, [504] = 21542, [507] = 26159, [510] = 30340, [513] = 30332, [516] = 29983, [519] = 30340, [522] = 25151, [525] = 36024, [528] = 39318, [531] = 20184, [534] = 22763, [537] = 22823, [540] = 22827, [543] = 31532, [546] = 19977, [549] = 26041, [552] = 115, [553] = 102, [554] = 115, [555] = 100, [556] = 102, [557] = 34, [558] = 44, [559] = 10, [560] = 32, [561] = 32, [562] = 32, [563] = 32, [564] = 32, [565] = 32, [566] = 91, [567] = 34, [568] = 115, [569] = 101, [570] = 114, [571] = 118, [572] = 105, [573] = 99, [574] = 101, [575] = 95, [576] = 116, [577] = 105, [578] = 109, [579] = 101, [580] = 34, [581] = 93, [582] = 32, [583] = 32, [584] = 61, [585] = 32, [586] = 32, [587] = 49, [588] = 54, [589] = 54, [590] = 54, [591] = 54, [592] = 56, [593] = 57, [594] = 57, [595] = 52, [596] = 53, [597] = 44, [598] = 10, [599] = 32, [600] = 32, [601] = 32, [602] = 32, [603] = 125, [604] = 44, [605] = 10, [606] = 32, [607] = 32, [608] = 32, [609] = 32, [610] = 91, [611] = 52, [612] = 93, [613] = 32, [614] = 32, [615] = 61, [616] = 32, [617] = 32, [618] = 123, [619] = 10, [620] = 32, [621] = 32, [622] = 32, [623] = 32, [624] = 32, [625] = 32, [626] = 91, [627] = 34, [628] = 117, [629] = 105, [630] = 100, [631] = 95, [632] = 102, [633] = 114, [634] = 111, [635] = 109, [636] = 34, [637] = 93, [638] = 32, [639] = 32, [640] = 61, [641] = 32, [642] = 32, [643] = 49, [644] = 48, [645] = 48, [646] = 48, [647] = 53, [648] = 49, [649] = 44, [650] = 10, [651] = 32, [652] = 32, [653] = 32, [654] = 32, [655] = 32, [656] = 32, [657] = 91, [658] = 34, [659] = 109, [660] = 115, [661] = 103, [662] = 34, [663] = 93, [664] = 32, [665] = 32, [666] = 61, [667] = 32, [668] = 32, [669] = 34, [670] = 53, [671] = 10, [672] = 46, [673] = 46, [674] = 46, [675] = 10, [676] = 46, [677] = 46, [678] = 46, [679] = 10, [680] = 46, [681] = 46, [682] = 46, [683] = 10, [685] = 24503, [688] = 36739, [691] = 22909, [694] = 30340, [697] = 25171, [700] = 31639, [703] = 22238, [706] = 21040, [709] = 23478, [712] = 38463, [715] = 20811, [718] = 33487, [721] = 22909, [724] = 22823, [727] = 24320, [730] = 22987, [733] = 30340, [736] = 21888, [739] = 20160, [742] = 38463, [745] = 36798, [748] = 21704, [751] = 33832, [754] = 20811, [757] = 25171, [760] = 31639, [763] = 30340, [766] = 34, [767] = 44, [768] = 10, [769] = 32, [770] = 32, [771] = 32, [772] = 32, [773] = 32, [774] = 32, [775] = 91, [776] = 34, [777] = 115, [778] = 101, [779] = 114, [780] = 118, [781] = 105, [782] = 99, [783] = 101, [784] = 95, [785] = 116, [786] = 105, [787] = 109, [788] = 101, [789] = 34, [790] = 93, [791] = 32, [792] = 32, [793] = 61, [794] = 32, [795] = 32, [796] = 49, [797] = 54, [798] = 54, [799] = 54, [800] = 54, [801] = 57, [802] = 48, [803] = 52, [804] = 53, [805] = 48, [806] = 44, [807] = 10, [808] = 32, [809] = 32, [810] = 32, [811] = 32, [812] = 125, [813] = 44, [814] = 10, [815] = 32, [816] = 32, [817] = 32, [818] = 32, [819] = 91, [820] = 54, [821] = 93, [822] = 32, [823] = 32, [824] = 61, [825] = 32, [826] = 32, [827] = 123, [828] = 10, [829] = 32, [830] = 32, [831] = 32, [832] = 32, [833] = 32, [834] = 32, [835] = 91, [836] = 34, [837] = 117, [838] = 105, [839] = 100, [840] = 95, [841] = 102, [842] = 114, [843] = 111, [844] = 109, [845] = 34, [846] = 93, [847] = 32, [848] = 32, [849] = 61, [850] = 32, [851] = 32, [852] = 49, [853] = 48, [854] = 48, [855] = 48, [856] = 53, [857] = 49, [858] = 44, [859] = 10, [860] = 32, [861] = 32, [862] = 32, [863] = 32, [864] = 32, [865] = 32, [866] = 91, [867] = 34, [868] = 109, [869] = 115, [870] = 103, [871] = 34, [872] = 93, [873] = 32, [874] = 32, [875] = 61, [876] = 32, [877] = 32, [878] = 34, [879] = 32, [880] = 108, [881] = 111, [882] = 99, [883] = 97, [884] = 108, [885] = 32, [886] = 110, [887] = 97, [888] = 109, [889] = 101, [890] = 32, [891] = 61, [892] = 32, [893] = 117, [894] = 115, [895] = 101, [896] = 114, [897] = 32, [898] = 97, [899] = 110, [900] = 100, [901] = 32, [902] = 117, [903] = 115, [904] = 101, [905] = 114, [906] = 58, [907] = 71, [908] = 101, [909] = 116, [910] = 78, [911] = 97, [912] = 109, [913] = 101, [914] = 40, [915] = 41, [916] = 32, [917] = 111, [918] = 114, [919] = 32, [920] = 115, [921] = 101, [922] = 108, [923] = 102, [924] = 46, [925] = 100, [926] = 97, [927] = 116, [928] = 97, [929] = 34, [930] = 44, [931] = 10, [932] = 32, [933] = 32, [934] = 32, [935] = 32, [936] = 32, [937] = 32, [938] = 91, [939] = 34, [940] = 115, [941] = 101, [942] = 114, [943] = 118, [944] = 105, [945] = 99, [946] = 101, [947] = 95, [948] = 116, [949] = 105, [950] = 109, [951] = 101, [952] = 34, [953] = 93, [954] = 32, [955] = 32, [956] = 61, [957] = 32, [958] = 32, [959] = 49, [960] = 54, [961] = 54, [962] = 54, [963] = 54, [964] = 57, [965] = 48, [966] = 52, [967] = 54, [968] = 49, [969] = 44, [970] = 10, [971] = 32, [972] = 32, [973] = 32, [974] = 32, [975] = 125, [976] = 44, [977] = 10, [978] = 32, [979] = 32, [980] = 125, [981] = 44, [982] = 10, [983] = 125, }
另外把另一个报错的数据的二进制格式显示出来了
初步判断应该是我在转换函数里调用了string.sub函数进行拼接,导致完整的utf8格式被破坏了导致。
你可以用 utf8.len
检查是否是合法的 utf-8 串。
非常感谢云大的大力支持,目前问题已经解决,问题原因已经基本确定是string.sub后字节重新拼接导致打破了正常的utf8字节序。
为解决此问题重新实现了两个函数,这里列出,方便有碰到同类问题的同学参考:
-- 以字符为单位获取子串(区别于string.sub按字节为单位)
function nnstring.sub_chat(s, i, j)
i = i or 1
local ii = utf8.offset(s, i)
local jj = j and (utf8.offset(s, j + 1) - 1) or #s
local ret = string.sub(s, ii, jj)
return ret
end
--[[
print(nnstring.sub_chat("中国afd人", 1, 2))
-- >>中国
print(nnstring.sub_chat("中国afd人", 2, 4))
-- >>国af
print(nnstring.sub_chat("中国afd人", 2))
-- >>国afd人
print(nnstring.sub_chat("中国afd人", -2))
-- >>d人
]]
-- 按字符为单位切割字符串,保留头和尾,切割中间部分,一般用于日志输出
function nnstring.cut(s, max_len, ellipsis, head_len, tail_len)
ellipsis = ellipsis or "..." -- string.rep("\r\n" .. "...", 3) .. "\r\n"
local len = utf8.len(s)
if (len > max_len) then
head_len = head_len or math.floor(max_len * 3 / 4)
tail_len = tail_len or (max_len - head_len - utf8.len(ellipsis))
if tail_len <= 0 then
tail_len = 3
head_len = head_len - tail_len
end
s = nnstring.sub_chat(s, 1, head_len)
.. ellipsis
.. nnstring.sub_chat(s, -tail_len)
-- print(s, utf8.len(s), max_len)
assert(utf8.len(s) <= max_len, utf8.len(s))
end
return s
end
-- print(nnstring.cut("中华人民共和国中华人民共和国", 10)) --, ellipsis, head_len, tail_len))
-- -->>中华人民...共和国
另外还有个小问题是,为啥被打破的字符串字调用for utf8.codes时不会报错,按lua官方文档说应该要报错的:
utf8.codes (s [, lax])
Returns values so that the construction
for p, c in utf8.codes(s) do body end
will iterate over all UTF-8 characters in string s, with p being the position (in bytes) and c the code point of each character. It raises an error if it meets any invalid byte sequence.
It raises an error if it meets any invalid byte sequence.
报错:Invalid utf8 string,不知道对lua里使用的字符串格式有什么要求?