asm-js / validator

A reference validator for asm.js.
Apache License 2.0
1.78k stars 148 forks source link

[Question] performance implications of bound checks for array indexes #78

Closed ibukanov closed 10 years ago

ibukanov commented 10 years ago

The current specification requires that an out-of-bounds array access returns 0/NaN for int/double arrays. This implies that the generated machine code must not only perform bound checks but that it must do them using the given operational mode. In particularly, this excludes replacing the checks with constructions like array[index & mask] that could be implemented using faster machine code. Was this discussed?

kripken commented 10 years ago

Yes, you do need to check for out of bounds. You can either do bounds checks (optimizing JITs can hoist those, so most can get eliminated), or use other methods like raise a signal when something out of bounds is accessed. That's what OdinMonkey does in some cases. The problems with a mask are

  1. Slower on existing JITs, testing showed
  2. Larger code size
  3. Slower than using a signal handler

OTOH benefits of using a mask include that the mask can contain a mask for the lower bits, to force alignment.

ibukanov commented 10 years ago

With native code on many CPU masking is faster as it generate less code and avoids potentially expensive jumps. So the question is not about requiring it but rather allowing it in specs. For example, the spec may say that arrays that asm.js code use may implement different semantic on out-of-bounds access than what is available through the standard Float64Array etc. In particular they may throw an exception or use masking.

kripken commented 10 years ago

The mask is more code than using a signal handler, though? The signal handler makes it so you have neither a bounds check nor a mask, just the pure load itself. If it's out of bounds, the signal handler is called and you then handle that.

asm.js is a pure subset of JS, so we can't change how typed arrays behave. However, someone could propose a new type of typed array to the standards bodies, and if accepted asm.js could use it. This might be interesting to do at some point, especially if someone has benchmarks showing the need. Indeed throwing an exception would be nicer and better, but without numbers it's hard to be sure.

ibukanov commented 10 years ago

The mask is more code than using a signal handler, though?

That is only possible if hardware supports it like using segmented registers on x86. ARM does not have that.

The signal handler makes it so you have neither a bounds check nor a mask, just the pure load itself. If it's out of bounds, the signal handler is called and you then handle that.

That still requires a runtime support as in the signal handler one has to know precisely the origin of the violation. Depending on the architecture that may inhibit some optimizations.

asm.js is a pure subset of JS, so we can't change how typed arrays behave.

As a user I can wrap the standard names into something else. So the spec can document that and say that during linking the confirming AOT can wrap the heap object into a version that either throws on out-of-bounds or masks the index so asm.js code must not assume a particular out-of-bounds behavior.

kripken commented 10 years ago

Yes, signal handler tricks don't work everywhere. Arm is the main issue. I wonder if Arm64 will allow them?

In theory we could replace the normal typed arrays with other things. But asm.js has always been careful to run well on JITs that have not optimized especially for it. So we could have done various patterns that, if optimized for, are very fast, but existing JITs would be slow on. That would have been unfair to the other JITs, so we compromised and picked patterns that worked well even without special support. When we did feel like there was something useful that has no good way to do it (with good speed on non-optimizing JITs), then instead of adding a pattern in asm.js we worked through the standards bodies - Math.imul and Math.fround, for example.

So far, this has not limited us greatly. And right now there is work on various important optimizations like float32. When those are in place, perhaps this issue will be one of the main remaining slowdowns compared to native, and we can reconsider this. But again, the preferable thing is not to invent special patterns for asm, but use JS things is natural ways, so we would want to go through the standards bodies and so forth.

ghost commented 10 years ago

Signal handler tricks work on all 64-bit archs, and should work on ARM64. Recording the address of loads doesn't inhibit optimizations; we just record the offset of any loads/stores that show up, wherever they appear. The |0/+ coercion on the loads mean that it's not semantically observable when exactly the out-of-bounds occurs, so, e.g., the loads/stores can be hoisted.

However, we do support masking optimizations, it just doesn't require any spec modifications: if you write:

HEAPI32[(i&M)>>2]

where M is smaller than the declared size of the heap (asm.js validation emits a link-time requirement for a constant heap access HEAPI32[N] that the heap size be at least N) then the heap access will emit a single (hoistable) 'and' instruction (there is already a mask necessary b/c of the forced alignment of typed arrays, so this makes the mask sorta "free"). OdinMonkey also allows const declarations (soon to be in the asm.js spec) so that 'M' can actually be written as 'M', avoiding the long literal string. Emscripten doesn't yet emit this; but last time Douglas Crosher measured, I think it was a ~10% or so speedup on ARM (I could be misremembering, though).

Tentatively closing the issue; feel free to reopen if you have a further enhancement to suggest.