Closed zherczeg closed 11 months ago
IMO Using mov64 in interpreter is wrong I will look it
(Actually not related to this issue) I have a question about stack area.
In JIT code, where are operands and temporal values of bytecode located?
Native stack area like interpreter mode or separate heap area allocated only for JIT?
Is this also right that register %r15
is reserved to point to stack address?
For simplicity, it uses the area allocated by interpreter. Basically the interpret()
call is replaced by a native call. I don't think it is worth to have a different stack layout for jit. As for optimizations, any stack area improvement would be beneficial for both jit and interpreter in the same way.
Recent version of interp produces this result
required stack size: 48 bytes
required stack size due to local: 24 bytes
bytecode size: 242 bytes
0 const32 dstOffset: 32 value: 0
16 move32 srcOffset: 32 dstOffset: 0
32 const32 dstOffset: 32 value: 1
48 move32 srcOffset: 32 dstOffset: 8
64 const32 dstOffset: 32 value: 0
80 move32 srcOffset: 32 dstOffset: 16
96 I32Add src1: 0 src2: 8 dst: 32
112 move32 srcOffset: 8 dstOffset: 0
128 move32 srcOffset: 32 dstOffset: 8
144 const32 dstOffset: 40 value: 1
160 I32Add src1: 16 src2: 40 dst: 16
176 const32 dstOffset: 40 value: 10000000
192 I32LtU src1: 16 src2: 40 dst: 32
208 jump_if_true srcOffset: 32 dst: 96
224 end resultOffsets: 16
this looks better! Thank you! The simple test is twice as fast now.
We have noticed a very interesting slowdown on x86. Consider the following simple WebAssembly code (it computes Fibonaccy numbers):
Let me show the machine code generated by the old jit (which does not use the walrus byte code), and the new jit:
Old JIT: 0.101s
New JIT: 0.158s
These are basically the same, except they use different locations for local variables. And another difference: it uses 64 bit copy in "mov 0x8(%r15),%rdx", which is just a 32 bit value. We have measured it on multiple systems, and somehow the old code is 50% (or more) faster.
The byte code dump of the interpreter:
It uses move_64 operations, and jit simply translates them to 64 bit movs.