Reduce the size of `PyCodeObject`

markshannon commented 2 years ago

PyCodeObject is a bit bloated.

It can be reduced by:

Lazily evaluating some attributes
Combining some attributes
Better sharing of data

Lazily evaluating some attributes

The following attributes can be lazily evaluated, and don't need to be stored.

The following can be derived from co_localspluskinds

co_argcount
co_posonlyargcount
co_kwonlyargcount
co_nplaincellvars
co_ncellvars
co_nfreevars
Others:
co_nlocalsplus or co_nlocalsplus. Either can be computed from the other and co_framesize

Combining some attributes

The names and consts tuples could be combined into a single tuple. This would save a pointer and object header. It also would save a pointer in the interpreter, possibly speeding evaluation a tad.

The various byte arrays could also be combined. The co_exceptiontable, co_linetable, and co_localspluskinds can be merged into a single array, replacing co_linetable with co_linetable_offset and replacing co_localspluskinds with co_localspluskinds_offset

Better sharing of some data

This probably applies more to unmarshaling. The marshaled data should have the ability to directly refer to immortal strings by some sort of index.

Other fields

https://github.com/faster-cpython/ideas/issues/462 will rermove the co_warmup field. _co_firsttraceable can be removed once PEP 669 is implemented.

ericsnowcurrently commented 2 years ago

This probably applies more to unmarshaling. The marshaled data should have the ability to directly refer to immortal strings by some sort of index.

@kumaraditya303, what do you think?

brandtbucher commented 2 years ago

This probably applies more to unmarshaling. The marshaled data should have the ability to directly refer to immortal strings by some sort of index.

This sounds a bit tricky, because in practice we'll need to bump the magic PYC number everytime any static strings are added, changed, or removed.

kumaraditya303 commented 2 years ago

This probably applies more to unmarshaling. The marshaled data should have the ability to directly refer to immortal strings by some sort of index. what do you think?

This is an interesting idea!, although a bit tricky but I think this could reasonably implemented and be automated with a script or two like how the interned dict code is currently generated. We can have an array of interned strings and write the index of it during marshalling and while unmarshalling, it could lookup the string with index from the array.

faster-cpython / ideas