faster-cpython / ideas

1.67k stars 49 forks source link

Superblock management #562

Open markshannon opened 1 year ago

markshannon commented 1 year ago

Superblocks need to be managed.

The current implementation uses caches that are a multiple of 16 bits. So lets limit the number of superblocks to 2**16. I'm going to assume for now that this is sufficient even for a very large program with a relatively flat execution profile (I plan to do a proper analysis of this, but we'll assume that it's OK for now).

This means that we can refer to a superblock with just 2 bytes, which should help keep things compact.

Superblocks will hold strong references any objects they rely on: code objects, functions, other superblocks, etc. This means that superblocks will keep other object alive. This is OK if we delete cold superblocks. Any superblock that depends on an otherwise dead object will not be reachable, will grow cold and be eliminated, freeing the other objects.

We can check for cold superblocks in a background thread or at a regular interval using the eval-breaker mechanism. Scanning ~1% superblocks every few milliseconds of CPU time should be about right.

Determining if a superblock is cold will require an execution counter, which has cost, but should be low provided superblocks are large enough.

Possible data structures.

Table of superblocks

References to superblocks should be kept in an array. Maybe fixed sized, maybe variable sized with a maximum size of 64k (which would take half a MB on a 64 bit machine)

Mapping deoptimization events to superblocks

Conceptually we need a bag, mapping individual events to sets of superblocks. Implementing it a dictionary would be very wasteful, but might be OK for early iterations.

A much more compact approach would be store a bloom filter of events in each superblock. This would produce false positives for deoptimization and be very slow finding traces. With some additional data to speed the search though, this could be good enough.

Finding superblocks from source location.

A simple map (code, offset) -> superblock should work. We can use a dictionary initially, but a much more compact map is possible: We can store each entry in 8 bytes. 3 for the code object version, 3 for the offset and 2 for the superblock index.