Open maartensandbox opened 1 year ago
Hmm, yes that is an interesting use case. The "inner join" and "left join" patterns are pretty common.
Note this is getting very low level. Maybe something like the "tokens" dictionary defined for hash-based dictionaries, that iterates (token, hash)
, could be helpful, together with an extension of gettoken
where you provide the key and precomputed hash. For example, if keyword arguments are fast we enough we could try for a signature like gettoken(key; hash)
, or else create a new generic functions.
We also overwrite and utilize a bit or two of the hash for internal purposes, so we need to take care with that (ideally this would work between Dictionary
and UnorderedDictionary
without problems).
Do you see any other nice ways of implementing this?
One could also add a specialized gettoken that dispatches on Pair{I,Uint} for Indices{I}? It will anyway probably only be something implemented for Dictionary, as those are the only indices that currently store the hash. Although one could probably do something similar for unordereddictionary, the index reveals the first n bits of the hash, and a lookup in a different unordereddictionary can then sometimes avoid having to recompute the hash, but I don't think it's worth the hassle.
The internal bits can be somewhat confusing when iterating (token,hash), and then not actually having hash(token) == hash, but I don't think that's a real problem. When users start supplying their own hash to get faster lookups, then they probably will also have read the documentation, and the lookup function will anyway "&hashmask" whatever gets passed in.
I have two dictionaries. I iterate over the keyvalue pairs of the first one, and if the second dict also contains that key then I assign it a new value.
It would be nice if I was able (maybe I already am?) to iterate over the (key,hash,value) pairs, and have a faster lookup in the second dict that avoids rehashing the key.