Buffer over-read in TextDecoder()

awslabs / llrt

LLRT (Low Latency Runtime) is an experimental, lightweight JavaScript runtime designed to address the growing demand for fast and efficient Serverless applications.

Apache License 2.0

7.73k stars 341 forks source link

Buffer over-read in TextDecoder() #421

Closed jackkleeman closed 1 week ago

jackkleeman commented 2 weeks ago

TextDecoder decodes a full Buffer when presented with a subarray of a Buffer:

$ llrt -e 'let utf8decoder = new TextDecoder(); console.log(utf8decoder.decode(Buffer.from("abcdefg", "utf8").subarray(0,1)));'
abcdefg
$ llrt -e 'let utf8decoder = new TextDecoder(); console.log(Buffer.from("abcdefg", "utf8").subarray(0,1).toString("utf8"));'
a

jackkleeman commented 2 weeks ago

Probably related, Buffer.from on a Buffer also overruns

$ llrt -e 'console.log(Buffer.from(Buffer.from("abcdefg", "utf8").subarray(0,1)).toString());'
abcdefg

richarddavison commented 2 weeks ago

Thanks @jackkleeman for this very clear bug report. We'll be taking a look! Edit: Found the issue. Bytes should not be copied when we create a buffer from a slice. We'll need some slight refactor of the Buffer implementation to correct this.

jackkleeman commented 1 week ago

llrt -e 'const array = new Uint8Array([97, 98, 97, 98]); console.log(new TextDecoder().decode(array.subarray(0,1)))'

The issue also appears to happen with Uint8Array, not just Buffer (maybe same underlying impl?)

richarddavison commented 1 week ago

llrt -e 'const array = new Uint8Array([97, 98, 97, 98]); console.log(new TextDecoder().decode(array.subarray(0,1)))'
The issue also appears to happen with Uint8Array, not just Buffer (maybe same underlying impl?)

Jup, even with strings. Since subarrays shared the same memory, we didn't account for this. This should be fixed in https://github.com/awslabs/llrt/pull/422 (and slightly better performance) since we're no longer copying the buffers around.

richarddavison commented 1 week ago

Fixed in https://github.com/awslabs/llrt/pull/422