faster-cpython / ideas

1.67k stars 49 forks source link

Optimize large constant dicts #544

Open gvanrossum opened 1 year ago

gvanrossum commented 1 year ago

Currently a constant dict pushes all the values on the stack, like this:

  2           2 LOAD_CONST               1 (2)
              4 LOAD_CONST               2 (4)
              6 LOAD_CONST               3 (6)
              8 LOAD_CONST               4 (8)
             10 LOAD_CONST               5 (0)
             12 LOAD_CONST               6 ((1, 3, 5, 7, 9))
             14 BUILD_CONST_KEY_MAP      5

This pattern is a bit wasteful and pushes the number of constants to great heights (e.g. locale.py has a dict with 588 keys).

I'm sure we could come up with another opcode that loads the values as a single tuple as well.

lpereira commented 1 year ago

Alternatively, instead of another opcode:

LOAD_GLOBAL dict
LOAD_CONST (((1, 2), (3, 4), (5, 6), (7, 8), (9, 0)))
CALL_FUNCTION
gvanrossum commented 1 year ago

The problem with that is, what if the user defines their own global named dict? That shouldn't affect dict literals.

lpereira commented 1 year ago

If the whole dict is const, can it be shoved in the constants directly instead of being built during runtime? Or would we have issues with things like hash randomization and whatnot?

gvanrossum commented 1 year ago

Constants must be (deep) immutable, but dicts aren't. It's totally reasonable to write

def f(x):
    a = {1:2, 3:4}
    a[x] = x+1
    return a

If the dict was part of the consts (i.e., in the code object), we would just have mutated the constant in the code object.

This is why only tuples are loaded directly, mutable values must be copied. The pattern using LIST_EXTEND and SET_UPDATE handles this.

An extra opcode really isn't so bad if we want to address this case.

methane commented 1 year ago

LOAD_CONST (((1, 2), (3, 4), (5, 6), (7, 8), (9, 0)))

Please don't forget that unmarshal is yet another interpreter. This needs 6 tuples, although current opcodes need only one tuple.

Original idea ("loads the values as a single tuple") would be better, although I don't know how it is effective.

TeamSpen210 commented 1 year ago

Perhaps what we could do is create a new type which is just a wrapper around a split dict keys struct. Then it'd be very fast to build a new dict since we can either just directly use the split key struct, or memcpy and incref to make a combined table every time. Then have values as either a tuple or on the stack, depending on if they're constant. Though most keys would likely be strings or ints which have cached/cheap hash functions, so I'm not sure if this would be worth It.

gvanrossum commented 1 year ago

Why not just do this?

LOAD_CONST (2, 4, 6, 8, 9)
LOAD_CONST (1, 3, 5, 7, 9)
BUILD_CONST_KEY_CONST_VALUE_MAP 5

The new opcode would be as simple like

keys = POP();
values = POP();
map = _PyDict_FromItems(
                    &PyTuple_GET_ITEM(keys, 0), 1,
                    &PyTuple_GET_ITEM(values, 0), 1, oparg);
PUSH(map);

plus various checks and DECREFs (and converted to the instruction DSL format).

methane commented 1 year ago

It would be better than current behavior, when the dict is created repeatedly. But I don't expect enough performance gain.

If we introduce new opcode, it can guarantee there are not duplicated keys in the keys tuple. It can optimize "creating dict" time too. (See python/cpython#86292)

gvanrossum commented 1 year ago

I came to it not because I wanted faster constant dict creation but because I wanted more compact code -- in locale.py, all constants used after that big dict will be >= 256 and hence require an EXTENDED_ARG prefix. And in the register machine EXTENDED_ARG will also be a double-size instruction so it will cause even more code bloat.

But it's all rare enough that @markshannon's comment ("Is this worthwhile?") still applies there may not be a great future for this idea. (And there's actually an implementation trick possible that at least avoids moving the constants from the const array to an array of registers.)

markshannon commented 1 year ago

Currently the compiler tries to keep stack consumption down. If the dict is too large, it will produce many

LOAD_CONST    key
LOAD_CONST   value
MAP_ADD     

triples. BUILD_CONST_KEY_CONST_VALUE_MAP migth help, but I doubt it is worth worrying about. As @methane says unmarshal is also an interpreter.

I think a better approach might be to use use a tuple of pairs and a binary search in locale.py. That way we can deepfreeze the whole thing.