Closed tve closed 11 months ago
Yes, this is known. No plans to fix it in this server. The caller can use TextEncoder to convert the string to binary UTF-8 and write that instead.
FWIW – the ECMA-419 HTTP client and server only operate on binary data (not strings). That's deliberate so that these text conversion issues remain external.
Yes, this is known. No plans to fix it in this server.
Noting this in the docs would have saved me a bunch of time and aggravation...
FWIW – the ECMA-419 HTTP client and server
Thanks for pointing this out again, ~I'm not finding the HTTPServer implementation in the Moddable SDK, am I missing something?~ found it
(Added a note to the docs)
FYI: the issue with NUL characters seems to also affect Array.fromString. Some uses, like in the ecma-419 mqtt client look vulnerable. I assume this is known and won't-fix.
The default XS string encoding is almost UTF-8. The exception is NULLs which use the CESU-8 encoding. Here's a modified version of ArrayBuffer.fromString
that handles the NULL encoding.
Here's the corresponding change to String.fromArrayBuffer
:
Let me know if this works for you. If it does, I'll merge it.
CESU-8: learn something new every day... I'll plug it in and report back, thanks!
I'll plug it in and report back, thanks!
Great. How'd it go?
My test for Array.fromString looks good, ~the impact on larger buffers (1KB) is ~2x but on smaller strings I would call it "in the noise"~ (using esp32):
import Time from "time"
import TextEncoder from "text/encoder"
const ten = "0123456789"
const hundred = ten + ten + ten + ten + ten + ten + ten + ten + ten + ten
const thousand =
hundred + hundred + hundred + hundred + hundred + hundred + hundred + hundred + hundred + hundred
const ten0 = "01234\x006789"
const hundred0 = ten0 + ten0 + ten0 + ten0 + ten0 + ten0 + ten0 + ten0 + ten0 + ten0
const thousand0 =
hundred0 +
hundred0 +
hundred0 +
hundred0 +
hundred0 +
hundred0 +
hundred0 +
hundred0 +
hundred0 +
hundred0
function timeit(desc: string, f: () => void): void {
const t0 = Time.ticks
f()
const t1 = Time.ticks
trace(`${desc} took ${t1 - t0}ms\n`)
}
const tests = [
["ten", ten],
["hundred", hundred],
["thousand", thousand],
["ten0", ten0],
["hundred0", hundred0],
["thousand0", thousand0],
]
trace("\n===== String NUL test =====\n")
for (const test of tests) {
const ab = ArrayBuffer.fromString(test[1])
const u8 = new Uint8Array(ab)
const te = new TextEncoder().encode(test[1])
const same = u8.byteLength == te.byteLength && u8.every((v, i) => v == te[i])
trace(
`${test[0]} ${test[1].length} ${ab.byteLength} ` +
`${u8.slice(0, 8)}...${u8[u8.byteLength - 1]} same=${same}\n`
)
}
for (const test of tests) {
timeit(`${test[0]} fromString`, () => {
const s = test[1]
for (let i = 0; i < 100; i++) {
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
ArrayBuffer.fromString(s)
}
})
}
// Before fix:
// ===== String NUL test =====
// ten 10 10 48,49,50,51,52,53,54,55...57 same=true
// hundred 100 100 48,49,50,51,52,53,54,55...57 same=true
// thousand 1000 1000 48,49,50,51,52,53,54,55...57 same=true
// ten0 10 11 48,49,50,51,52,192,128,54...57 same=false
// hundred0 100 110 48,49,50,51,52,192,128,54...57 same=false
// thousand0 1000 1100 48,49,50,51,52,192,128,54...57 same=false
// ten fromString took 37ms
// hundred fromString took 40ms
// thousand fromString took 67ms
// ten0 fromString took 37ms
// hundred0 fromString took 41ms
// thousand0 fromString took 69ms
//
// After fix:
// ===== String NUL test =====
// ten 10 10 48,49,50,51,52,53,54,55...57 same=true
// hundred 100 100 48,49,50,51,52,53,54,55...57 same=true
// thousand 1000 1000 48,49,50,51,52,53,54,55...57 same=true
// ten0 10 10 48,49,50,51,52,0,54,55...57 same=true
// hundred0 100 100 48,49,50,51,52,0,54,55...57 same=true
// thousand0 1000 1000 48,49,50,51,52,0,54,55...57 same=true
// ten fromString took 26ms
// hundred fromString took 33ms
// thousand fromString took 107ms
// ten0 fromString took 22ms
// hundred0 fromString took 39ms
// thousand0 fromString took 174ms
~similar test for String.fromArrayBuffer will follow~ String.fromArrayBuffer looks good too.
Edit:
Thanks for the tests and benchmarks. Given the need for an extra pass over the data, it is going to take more time. Since the actual work is trivial, memory bandwidth is the limiting factor. Since this isn't generally performance critical, I think it is an OK tradeoff and keeps the code small.
I just noticed that fx_ArrayBuffer_fromString
can be optimized. The first thing it does is get the length of the string. That can be combined with the pass that searches for nulls. That should make the performance much closer to the original. Here's the updated version.
How's that look?
Separately, I would expect String.fromArrayBuffer()
performance to be more-or-less unchanged. There's no additional pass over the data. If there are nulls, it will be a little slower because it can't use memcpy
to transfer the bytes, but in the common case of no nulls it should pretty much match.
For ArrayBuffer.fromString
For String.fromArrayBuffer
Thanks for rechecking. The fromString optimization seems to be working nicely. I'll integrate the changes.
Moddable SDK version: 4.3 Target device: esp32
Description In an HTTP
prepareResponse
handler I'm returning a string body that happens to contain NUL characters ('\0'
). On the client side these come across as 0xc0 0x80 byte pairs and a byte is missing at the end.Steps to Reproduce
new Server({}).callback = function (message, value) { switch (message) { case Server.headersComplete: // prepare for request body return String
} }
trace(
Available on Wi-Fi "${Net.get("SSID")}"\n
) trace(curl --data-binary "@/users/[your directory path here]/test.txt" http://${Net.get( "IP" )}/test.txt -v\n
)(main) /h/s/m/j/host> curl http://192.168.0.216/ | od -c % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 18 100 18 0 0 34 0 --:--:-- --:--:-- --:--:-- 34 0000000 H e r e i s a [ 302 251 ] b 0000020 y t 0000022