Open markshannon opened 1 month ago
See also Ken Jin's thoughts on tagging, in a Google Doc and a PR (https://github.com/python/cpython/pull/118450).
The tagging scheme used in that PR is compatible with this. It needs very few modifications to tag everything to 0b11
like what this issue is suggesting.
Tagged pointer (value = ptr-1)
Shouldn't that be "(value = ptr-3)"? Or am I missing something?
Tagged pointer (value = ptr-1)
Shouldn't that be "(value = ptr-3)"? Or am I missing something?
It is ptr-1
: xxxx00 - 1 == yyyy11
, where yyyy == xxxx - 1
With deferred reference counting, we gain some freedom in how we represent references to objects. One common optimization in VMs, from early Smalltalk days to V8 and Javascript Core, is tagged pointers. The idea is to use one or more of the spare bits in a pointer to distinguish pointers from other values.
Given how ubiquitous integers are, they are the obvious candidate for tagging. Tagging floats is also appealing, but is probably too awkward to be viable, but here's an idea on how it might be done
Given that we plan to convert stack references from
PyObject *
to an opaque struct anyway, adding tagged ints becomes considerably easier.Stack and heap references
Converting from a reference struct to a
PyObject *
is potentially expensive if the struct contains a tagged int, as we need to allocate and free boxed ints. To reduce the cost we want to minimize the conversions. That means that most collection objects should contain reference structs, notPyObject *
s. We don't need to change them all at once though.Ideally, the only conversions would occur at the boundary between the VM and third party code. Although even that cost can be be largely eliminated
The tagging scheme
The obvious scheme is to set the low bit to 1 for unboxed ints, storing the actual value in the remaining bits. It actually makes more sense, however to tag pointers and not ints for a few reasons.