brandtbucher / automap

High-performance autoincremented integer-valued mappings. 🗺️
Other
16 stars 4 forks source link

Store keys directly in the table, instead of in a separate list #12

Open brandtbucher opened 2 years ago

brandtbucher commented 2 years ago

It occurred to me recently that the internal representation of our hash table does not need to concern itself with ordering. In fact, because of the nature of our mappings, order can be determined at any point by just using the values! Getting rid of our lists of keys would save a lot of allocations during construction, and some indirection during lookups.

If we did this, unordered iteration would still be linear, but ordered iteration would be quadratic with a naive implementation and quasi-linear with a more clever, complicated one.

Also, we would want to protect against adding keys during unordered iteration (which is super easy to check using the map's length). There's precedent for this with dict:

>>> d = dict.fromkeys(["S", "p", "a", "m"])
>>> for k in d:
...     d["X"] = None
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

As a side note, this also closes the door on one clever way that users can currently put our tables in a bad state (possibly causing hard crashes): by getting our hidden lists of keys/values using the gc module, and mutating them in-place.

@flexatone, some questions for you:

My guess is that the answer to both of these is "no", but it would be nice to know for sure.

flexatone commented 2 years ago

Thanks, @brandtbucher , for these thoughts.

First, reviewing StaticFrame's code, there is at least one case where we rely on the ordered iteration of automap; I am not certain, however, how often that case happens, and at first look, it does not seem necessary or unrefactorable. As StaticFrame Index objects retain a "labels" array, it is this that is used for implementing __iter__ and other order-dependent label evaluations.

Second, I am certain we never need to add keys during iteration.

If this change offers significant performance gains for initialization, it likely would be worth any changes StaticFrame would need to make.

Returning to the first question, if lookup is till quasi constant time, why would ordered iteration not be linear if, for example, we were just driving lookups with a range, i.e., (map[i] for i in range(len(map)))?

brandtbucher commented 2 years ago

Cool!

Returning to the first question, if lookup is till quasi constant time, why would ordered iteration not be linear if, for example, we were just driving lookups with a range, i.e., (map[i] for i in range(len(map)))?

That would only work if the mapping was an identity mapping of autoincremented integers i -> i. range(len(map)) equals map.values(), not necessarily map.keys().

We basically need to do the equivalent of sorted(map, key=map.__getitem__).