faster-cpython / ideas

1.67k stars 49 forks source link

Add instructions for creating small objects. #609

Open markshannon opened 11 months ago

markshannon commented 11 months ago

When we use a small constant in Python code, we compile it as LOAD_CONST. This looks efficient, but ignores the fact that the constant object has to be created by marshal, which is quite slow compared to the main interpreter.

This is fine for code in functions as they are expected to be executed many times, but for run once code, like module and class code, this is not so good.

Rather than have marshal create the object, to be put in a constants array to be used only once, we should create the objects in bytecode.

To do that, we want to add the following instructions:

These bytecodes are needed for https://github.com/faster-cpython/ideas/issues/566, so this would be a useful halfway-house.

Where to put the data? In https://github.com/faster-cpython/ideas/issues/566 I suggest a separate array. That won't work here because there is no separate array, but there is no reason why it cannot follow the instruction. It would make disassembly harder, but shouldn't matter for performance, as we don't expect to be branching much in the run-once code.

For this to be efficient, we need to avoid copying and traversing the code of the code object too many times. See #608 #462 #566 [Guido: I think that it's now #566, since #462 is closed] for how to deal with this.

This complements https://github.com/faster-cpython/ideas/issues/583 which reduces code object size, and thus code object loading time.

gvanrossum commented 10 months ago

I am definitely in favor of LOAD_INT, since the data fits in the oparg byte.

Let's look at LOAD_FLOAT next. Here marshal v1 writes a decimal string (basically "%17g" % x) but newer versions write a binary format -- however it's still a portable format, using PyFloat_Unpack8. If we could just write the binary data (as if it's an 8 byte binary string), the cost of unmarshalling would be reduced to just a call to PyFloat_FromDouble, which is presumably the minimal cost. (Note that the data bytes aren't copied but delivered as a char* pointer, at least when unmarshalling from an in-memory buffer -- as is always the case because that's how importlib works.) This seems a little easier to accomplish (just bump the marshal version) than adding infrastructure for variable-length instructions (which I've tried, and found doable but complicated -- it's different than caches, which are special-cased in many places).

For short strings and medium-sized integers, we can store up to 4 bytes of data using EXTENDED_ARG without making the instruction format nominally harder. Not sure if that's enough. :-)

jneb commented 10 months ago

Considering that there is already a table of frequent strings (I recall it containing the empty string and the one character strings of printable characters), there could be LOAD_FIXED_STR that gets an index into this table. This would encourage putting more entries into the table.