faster-cpython / ideas

1.67k stars 49 forks source link

Tagged integers #676

Open markshannon opened 1 month ago

markshannon commented 1 month ago

With deferred reference counting, we gain some freedom in how we represent references to objects. One common optimization in VMs, from early Smalltalk days to V8 and Javascript Core, is tagged pointers. The idea is to use one or more of the spare bits in a pointer to distinguish pointers from other values.

Given how ubiquitous integers are, they are the obvious candidate for tagging. Tagging floats is also appealing, but is probably too awkward to be viable, but here's an idea on how it might be done

Given that we plan to convert stack references from PyObject * to an opaque struct anyway, adding tagged ints becomes considerably easier.

Stack and heap references

Converting from a reference struct to a PyObject * is potentially expensive if the struct contains a tagged int, as we need to allocate and free boxed ints. To reduce the cost we want to minimize the conversions. That means that most collection objects should contain reference structs, not PyObject *s. We don't need to change them all at once though.

Ideally, the only conversions would occur at the boundary between the VM and third party code. Although even that cost can be be largely eliminated

The tagging scheme

The obvious scheme is to set the low bit to 1 for unboxed ints, storing the actual value in the remaining bits. It actually makes more sense, however to tag pointers and not ints for a few reasons.

  1. It means that we don't need to offset the int by 1 when performing arithmetic
  2. The cost of the offset for pointers is effectively zero, as almost all pointer dereferences include a constant offset. Changing the constant offset by one is free
  3. It help finds bugs. Casting a tagged pointer to an untagged pointer is incorrect, and by tagging pointers will almost certainly cause a crash
If we want to reserve a second bit for tagging other values, or maybe marking deferred references, then we have the following tagging scheme: Tag Meaning
00 Unboxed int
01 Reserved
10 Reserved
11 Tagged pointer (value = ptr-1)
gvanrossum commented 1 month ago

See also Ken Jin's thoughts on tagging, in a Google Doc and a PR (https://github.com/python/cpython/pull/118450).

Fidget-Spinner commented 1 month ago

The tagging scheme used in that PR is compatible with this. It needs very few modifications to tag everything to 0b11 like what this issue is suggesting.

gvanrossum commented 1 month ago

Tagged pointer (value = ptr-1)

Shouldn't that be "(value = ptr-3)"? Or am I missing something?

markshannon commented 1 month ago

Tagged pointer (value = ptr-1)

Shouldn't that be "(value = ptr-3)"? Or am I missing something?

It is ptr-1: xxxx00 - 1 == yyyy11, where yyyy == xxxx - 1