WebIDL Binder does not support the full range of values for unsigned integers

isaac-mason commented 1 week ago

Is it possible to opt-in to representing an unsigned int using HEAPU32 with the webidl binder?

When using the webidl binder and binding an attribute or function that uses unsigned int, the max unsigned int value 0xffffffff is returned as -1 in javascript.

It appears this is because unsigned integers are treated as signed integers.

This behaviour can be seen in the webidl binder tests: https://github.com/emscripten-core/emscripten/blob/bb220d85c65bace41918d4ba5b84e264cb88de4a/test/webidl/test_ALL.out#L85

Looking at some past commits, it appears this behaviour is intentional for performance reasons: https://github.com/emscripten-core/emscripten/commit/f1c42f42fdb10ece326ebdae985da017ceaab803

Looking now, it worked fine except for one HEAPU32 which should be HEAP32 (for performance; there is never a point to using the unsigned 32 bit heap unless you really really must).

I am using the webidl binder to create bindings for a library which uses 0xffffffff as a constant with a special meaning. Right now to work around this issue, in javascript I check for the value -1. This is a fine workaround for my case but feels like a hack, and would stop working if I needed to check for other values in the upper unsupported range.

Version of emscripten/emsdk:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.61 (67fa4c16496b157a7fc3377afd69ee0445e8a6e3)
clang version 19.0.0git (https:/github.com/llvm/llvm-project 7cfffe74eeb68fbb3fb9706ac7071f8caeeb6520)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /Users/isaacmason/Development/emsdk/upstream/bin

sbc100 commented 1 week ago

Im not aware of any performance issues with associated with HEAPU32. @kripken perhaps you can elaborate?

kripken commented 5 days ago

Historically some JS engines optimize JS Numbers when they are "small integers", which are typically 32-bit signed integers. Using HEAP32 ensures we fall into that range, and the unsigned heap might cause the optimization to fail and return to modeling that variable as a Number (double). I am actually not sure how significant this is these days - perhaps optimizations have improved?

In any case, @isaac-mason , in general when you care about the difference between signed and unsigned values then you need to cast on the boundary. Using (X) >>> 0 will turn a value into an unsigned 32-bit integer.

sbc100 commented 5 days ago

Historically some JS engines optimize JS Numbers when they are "small integers", which are typically 32-bit signed integers. Using HEAP32 ensures we fall into that range, and the unsigned heap might cause the optimization to fail and return to modeling that variable as a Number (double). I am actually not sure how significant this is these days - perhaps optimizations have improved?

In any case, @isaac-mason , in general when you care about the difference between signed and unsigned values then you need to cast on the boundary. Using (X) >>> 0 will turn a value into an unsigned 32-bit integer.

Will reading values larger than 2^31 from HEAPU32 still require >>> 0 in order to get an unsigned value?

sbc100 commented 5 days ago

Historically some JS engines optimize JS Numbers when they are "small integers", which are typically 32-bit signed integers. Using HEAP32 ensures we fall into that range, and the unsigned heap might cause the optimization to fail and return to modeling that variable as a Number (double). I am actually not sure how significant this is these days - perhaps optimizations have improved? In any case, @isaac-mason , in general when you care about the difference between signed and unsigned values then you need to cast on the boundary. Using (X) >>> 0 will turn a value into an unsigned 32-bit integer.

Will reading values larger than 2^31 from HEAPU32 still require >>> 0 in order to get an unsigned value?

I confirmed you don't need the >>> 0 if you read from HEAPU32

isaac-mason commented 5 days ago

Thanks @kripken @sbc100 for the information!

I misunderstood the -1 return as some behaviour where the upper range was being lost. Thanks for clearing this up for me 🙂

I'm happy to contribute a change to the webidl binder docs to explain this behaviour.

I am actually not sure how significant this is these days - perhaps optimizations have improved?

Are there suitable existing benchmarks in the emscripten repo that we could try answering this with?

kripken commented 4 days ago

Hmm, it's hard to measure this as it depends on a bunch of heuristics JS engines have. I don't think we have any good benchmarks for it. In general "use 31-bit integers" was a JS best practice for performance back in the day, but even then it was not entirely reliable.

Doc improvements would be welcome!

emscripten-core / emscripten

WebIDL Binder does not support the full range of values for unsigned integers #22134