JoshGlazebrook / smart-buffer

smart-buffer is a Buffer wrapper that adds automatic read & write offset tracking, string operations, data insertions, and more.
https://www.npmjs.com/package/smart-buffer
MIT License
102 stars 18 forks source link

Incorrect use of value.length in insertStringNT and writeStringNT when value is non-ascii strings or encoding is UTF-16 #50

Open anonghuser opened 11 months ago

anonghuser commented 11 months ago

The writeStringNT code when not given an offset argument is correct as it uses the writeOffset for placing the null terminator, but with an offset it is incorrect. The insertStringNT is always incorrect. You will have to switch the use of value.length for Buffer.byteLength() in those two places.

console.log(new SB().insertString('я\0', 0).toString('hex')); // d18f00

console.log(new SB().writeStringNT('я').toString('hex'));     // d18f00
console.log(new SB().writeStringNT('я', 0).toString('hex'));  // d100
console.log(new SB().insertStringNT('я', 0).toString('hex')); // d1008f

Your implementation could be simplified by just appending '\0' to the string and passing it to writeString/insertString. There will be no functional difference in all encodings except UTF-16, where the null terminator will become two-byte long:

console.log(new SB().writeString('abcd\0', 'utf16le').toString('hex'));      // 61006200630064000000

console.log(new SB().writeStringNT('abcd', 'utf16le').toString('hex'));      // 610062006300640000
console.log(new SB().writeStringNT('abcd', 0, 'utf16le').toString('hex'));   // 6100620000006400
console.log(new SB().insertStringNT('abcd', 0, 'utf16le').toString('hex'));  // 610062000063006400

If you go with this approach, you will also have to special-case the readStringNT code for UTF-16 (and all its aliases, to avoid hardcoding them you might check the length of an encoded '\0') to look for the even-aligned two-byte null, but that is incorrect currently anyway: https://github.com/JoshGlazebrook/smart-buffer/issues/23

I guess there may be an argument for the Buffer.byteLength() approach (while perhaps still changing the terminator to two-byte in utf16) to avoid temporary string values with the concatenation approach hurting performance, but I can't imagine how long the strings would need to be for this to become noticeable... Probably a non-issue.