faster-cpython / ideas

1.69k stars 49 forks source link

Reduce the size of `PyCodeObject` #463

Open markshannon opened 2 years ago

markshannon commented 2 years ago

PyCodeObject is a bit bloated.

It can be reduced by:

Lazily evaluating some attributes

The following attributes can be lazily evaluated, and don't need to be stored.

The following can be derived from co_localspluskinds

Combining some attributes

The names and consts tuples could be combined into a single tuple. This would save a pointer and object header. It also would save a pointer in the interpreter, possibly speeding evaluation a tad.

The various byte arrays could also be combined. The co_exceptiontable, co_linetable, and co_localspluskinds can be merged into a single array, replacing co_linetable with co_linetable_offset and replacing co_localspluskinds with co_localspluskinds_offset

Better sharing of some data

This probably applies more to unmarshaling. The marshaled data should have the ability to directly refer to immortal strings by some sort of index.

Other fields

https://github.com/faster-cpython/ideas/issues/462 will rermove the co_warmup field. _co_firsttraceable can be removed once PEP 669 is implemented.

ericsnowcurrently commented 2 years ago

This probably applies more to unmarshaling. The marshaled data should have the ability to directly refer to immortal strings by some sort of index.

@kumaraditya303, what do you think?

brandtbucher commented 2 years ago

This probably applies more to unmarshaling. The marshaled data should have the ability to directly refer to immortal strings by some sort of index.

This sounds a bit tricky, because in practice we'll need to bump the magic PYC number everytime any static strings are added, changed, or removed.

kumaraditya303 commented 2 years ago

This probably applies more to unmarshaling. The marshaled data should have the ability to directly refer to immortal strings by some sort of index. what do you think?

This is an interesting idea!, although a bit tricky but I think this could reasonably implemented and be automated with a script or two like how the interned dict code is currently generated. We can have an array of interned strings and write the index of it during marshalling and while unmarshalling, it could lookup the string with index from the array.