Open markshannon opened 1 year ago
I am definitely in favor of LOAD_INT
, since the data fits in the oparg
byte.
Let's look at LOAD_FLOAT
next. Here marshal v1 writes a decimal string (basically "%17g" % x
) but newer versions write a binary format -- however it's still a portable format, using PyFloat_Unpack8
. If we could just write the binary data (as if it's an 8 byte binary string), the cost of unmarshalling would be reduced to just a call to PyFloat_FromDouble
, which is presumably the minimal cost. (Note that the data bytes aren't copied but delivered as a char*
pointer, at least when unmarshalling from an in-memory buffer -- as is always the case because that's how importlib
works.) This seems a little easier to accomplish (just bump the marshal version) than adding infrastructure for variable-length instructions (which I've tried, and found doable but complicated -- it's different than caches, which are special-cased in many places).
For short strings and medium-sized integers, we can store up to 4 bytes of data using EXTENDED_ARG
without making the instruction format nominally harder. Not sure if that's enough. :-)
Considering that there is already a table of frequent strings (I recall it containing the empty string and the one character strings of printable characters), there could be LOAD_FIXED_STR that gets an index into this table. This would encourage putting more entries into the table.
We've added LOAD_SMALL_INT
in https://github.com/python/cpython/pull/125972.
When we use a small constant in Python code, we compile it as
LOAD_CONST
. This looks efficient, but ignores the fact that the constant object has to be created by marshal, which is quite slow compared to the main interpreter.This is fine for code in functions as they are expected to be executed many times, but for run once code, like module and class code, this is not so good.
Rather than have marshal create the object, to be put in a constants array to be used only once, we should create the objects in bytecode.
To do that, we want to add the following instructions:
LOAD_INT
: Loads a small int (0-255)MAKE_FLOAT
: Makes a float.MAKE_ASCII
: Makes a string from ASCII dataMAKE_UNICODE
: Makes a string from UTF-8 dataMAKE_LONG
: Make a long object from an array of bytes.These bytecodes are needed for https://github.com/faster-cpython/ideas/issues/566, so this would be a useful halfway-house.
Where to put the data? In https://github.com/faster-cpython/ideas/issues/566 I suggest a separate array. That won't work here because there is no separate array, but there is no reason why it cannot follow the instruction. It would make disassembly harder, but shouldn't matter for performance, as we don't expect to be branching much in the run-once code.
For this to be efficient, we need to avoid copying and traversing the code of the code object too many times. See
#608#462#566 [Guido: I think that it's now #566, since #462 is closed] for how to deal with this.This complements https://github.com/faster-cpython/ideas/issues/583 which reduces code object size, and thus code object loading time.