faster-cpython / ideas

1.68k stars 48 forks source link

Compact the `co_code` attribute of code objects. #608

Closed markshannon closed 1 year ago

markshannon commented 1 year ago

The co_code attribute of code object is no longer executed, it is just the representation of code on disk and for tools like dis that examine code objects. So we can optimize it for size and loading speed, not for execution speed.

We can change the bytecode format to be one byte per instruction for instruction without an oparg, and two bytes for instructions with an oparg, skipping caches. When creating the code object, we can decompress into _co_code_adaptive performing cache initialization at the same time.

One problem would be that computing the f_lasti attribute will be expensive as we need to convert the offset in _co_code_adaptive into the offset into co_code. We will need to do something similar for traceback objects to avoid having to do this computation when creating tracebacks.

See also https://github.com/faster-cpython/ideas/issues/462

markshannon commented 1 year ago

Having a different format for co_code and _co_code_adaptive may just be too complex, especially if we want to implement https://github.com/faster-cpython/ideas/issues/609 as the data parts will not compress.

So we should probably stick with using the compact format on disk only and lazily compute co_code, as suggested in #462