Avoid marshal for creating code objects from serialized data.

Once the work to allow any object as the "code" of a frame is done, we can take advantage of that to speed up creation of code objects from serialized data.

The idea is that the serialized data will consist of two parts:

A sequence of immutable bytecode
Supporting binary data.

Creation of the top-level (module) code object would be done as follows:

Create a "module initializer" object, consisting of a pointer to the binary data and debug info like the name and filename.
Create a frame, setting the "code" field to the module initializer and setting the instruction to point at the instructions.
Start executing in the interpreter.

What are the advantages of this?

Marshal is slow
There is no need for a secondary interpreter (marshal)
It allow partial deep-freezing, meaning that the names and consts arrays can be deep frozen without requiring that the code object is deep frozen. The resulting constant can be loaded with LOAD_COMMON_CONST.
It allows further improvements, e.g. we could skip creating a code object for the module, just creating them for functions.
It decouples the pyc format from marshal, allowing them to be improved separately.
Common objects can be shared very efficiently, by leaving them on the stack and using COPY instead of MAKE_...

Creating the instruction sequence

We can create the instruction in much the same way as marshal serializes; recursively emitting code for sub-objects until the entire object is complete.

To do this will need some new instructions and a few new instrinsics.

New general purpose instructions:

LOAD_COMMON_CONST Loads a constant from the global array containing None, True, etc plus assorted common constants
LOAD_COMMON_NAME Like LOAD_COMMON_NAME but from an array of strings.
LOAD_INT Loads a small int

Insructions to create objects from binary data.

These instructions will create an object from the binary data, advancing the pointer.

MAKE_FLOAT
MAKE_STRING
MAKE_LONG (we could build large ints from small ints, but that would be quadratic)
MAKE_BYTES
MAKE_CODE: Creates a code object from values on the stack (name, qualname, names, consts) and binary data

New instrinsic functions

make_complex (2)
make_frozenset (1)

We already have an instruction for making tuples.

The instruction sequence would finish with MAKE_CODE; RETURN_VALUE returning the completed instruction on the stack. Or, we could add another instruction, START_CODE at the end to execute the code object and return the completed module.

Examples

Creation of the tuple (1, "a", 37.0, (2, "foo"))

LOAD_INT 1
LOAD_COMMON_NAME "a"
MAKE_FLOAT 37.0
LOAD_INT 2
MAKE_STRING "foo"
BUILD_TUPLE 2
BUILD_TUPLE 3

Creation of a code object would look like something like this:

(Code to create names tuple)
(Code to create consts tuple)
MAKE_STRING name 
MAKE_STRING qualname
COPY n (filename will be shared for all code objects in module)
MAKE_CODE

faster-cpython / ideas