Open dylanahsmith opened 4 years ago
I agree we should treat a 24-bit index as a cold code path.
We could have specialized instructions for bigger indexes, as this is an edge case. And that's what most VMs do. A few ideas from other VMs I've seen:
Lua has LOADK & LOADKX (wide version)
Java has wide versions of every instruction accepting an index. Plus it even has specialized instructions for the first few indexes for a few (lload_0
, iconst_0
).
Python uses an instruction (EXTENDED_ARG
) prepended to other instructions to "bump" the index if too big to fit in a single byte: https://github.com/python/cpython/blob/037245c5ac46c3436f617a1f5d965929754be239/Python/ceval.c#L3734
@ggmichaelgo did #155 fully fix this issue? Or can this issue be updated with what is still needed?
The moving constant pointer optimizes for lookup of unique constants, as is the case when delegating to ruby AST objects. However, these will become less common as we add VM compilation for more of the ruby AST nodes.
As mentioned in https://github.com/Shopify/liquid-c/issues/83, the moving constant pointer would require the use of two offsets for both
ip
andconst_ptr
in order to handle jump instructions for control flow tags. Conditional branching will become a hot code path once we compile control flow tags to VM instructions, so we probably don't want the added overhead of managing the additional jump offset for the moving constant pointer.We might want to start by using a struct for the instruction pointer pair (ip, const_ptr) to reduce the amount of code affected by switching away from the moving constant pointer.
We could then introduce a constant table which can be referenced by index from instructions. We could start by using this for instructions that are more likely to re-use the same constant (e.g. OP_FILTER, OP_LOOKUP_CONST_KEY, OP_FIND_STATIC_VAR).
Since we are using byte code instructions, we will have to handle multi-byte index references. If we want to handle 24-bit length source string (i.e. 16 MiB), then I think that means we need to either handle 24-bit indexes to be conservative (assuming it isn't possible to introduce more constants than byte of source code but is possible to introduce a unique constant per 256 bytes of source code). However, we could instrument the code to determine if a 24-bit index is actually necessary in practice, if we want to remove/avoid support for it. We should at least be able to treat a 24-bit index as a cold code path.
Reading a 16-bit index does seem like it could end up being a hot code path. For performance, we might want to design for a little-endian CPU that can efficiently read a 16-bit aligned integer (e.g. x86, AMD64). To achieve the alignment of a 16-bit integer when adding an instruction which would have it unaligned, we can have an unaligned version of the instruction which simply skips over a padding/alignment byte then falls through to the aligned version of the instruction in the VM switch statement. We could try this strategy first in a separate PR for the
OP_PUSH_INT16
.