Open kmsquire opened 6 years ago
Why not just
updateindex!(dict, key, function_to_apply_to_value)
the outcome of the above should be the same as
dict[key] = get(dict, key, default_value) |> function_to_apply_to_value
so it's in one line and faster?
The "token" feature of SortedDict is meant to address two problems: multiple key lookups for an update operation (the topic of this issue) and also iterating over a sorted container in the sort-order of the keys. The token serves as the state-variable for this iteration. The updateindex! proposal in the above comment would solve the first problem for both Dict and SortedDict, but tokens or something similar would still be needed to solve the second problem for the sorted containers.
I favor API 2.
I've been hoping that we can copy the IndexStyle
type of idea from arrays and have iteration and methods like map
(of values) using tokens automatically, without knowing the details of the tokens of a given dictionary. (I'm hoping this will work together with the semantics of similar
in #24390 to make functions like map
(of values) really fast using simple, generic code).
I'd also like to point out that linear indexing is a kind-of token system for dense arrays.
Wouldn't that be API 1, then? That'd more transparently give you a "fast" key to use with indexing syntaxes. The downside with API 1 is that then dictionaries can no longer support all types in the language as keys — the DictToken
type (or whatever) would be special. I think that's fine. It's also worth noting that it'd be easy to define a gettoken1
fallback for ::Associative
— you just return the (likely slow) key itself by default.
I guess I was imagining methods of map
that use gettoken
and settoken!
and a separate slower (default) method that uses getindex
and setindex!
.
Wrapping in a DictToken
would be fine, also.
In any case, the tricky bit is when you do operations like map!
you need to figure out if the tokens of the input and output match, and what to do when they don't, etc.
I favor adding update!
. It's simple and handles the most common case. Tokens are more complex since you have the problem of invalidating them if the Dict changes.
API 1 seems a lot clearer to me.
I don't think it's correct that API 1 wouldn't be able to support all types in the language, because you would parameterize the token type on the dictionary type.
That is, for d::Dict{K,V}
, the gettoken(d, k)
function would return an AssociativeToken{Dict,K,V}
instance.
If you then wanted a new dictionary d2
whose keys were AssociativeToken{Dict,K,V}
, then d2
would be of type Dict{AssociativeToken{Dict,K,V}, V}
. gettoken
on d2
would return an Associative{Dict, AssociativeToken{Dict,K,V}, V}
instance. And so on.
Note that you'd want the function to be the first argument of update!
so that you could use it with do
syntax. But how would you avoid the redundant hashing in the following pattern:
if haskey(dict, k)
.... do something...
else
... do something else with dict[k] ....
end
?
Also, without tokens, how would you swap two elements, i.e.
d[k1], d[k2] = d[k2], d[k1]
without hashing the keys twice? The proposed update!
would only give you access to a single element, no?
update!
could have an optional default value to use if the key doesn't exist yet.
API 1 is ambiguous for Dicts of Any
, e.g. for memoization.
It's true, tokens are more powerful, but therefore more difficult to support correctly.
I'm still not convinced that update!
is sufficient for the kinds of things I was envisaging, though it is certainly helpful.
One obvious but tricky thing is that map!(f, dict1, dict2)
should be fast whenever dict1
and dict2
share compatible tokens (which will hopefully be likely after calling similar
). Ideally, we can also implement map!
with generic code like we do for AbstractArray
which as I understand will use linear index "tokens" whenever it can.
update!
can be useful on its own, so we could start with that and see whether people still need a lower-level system based on tokens.
Regarding update!, my favorite API
would be
merge!(combine, d::Dict{K,V}, kv::Pair{K,V})
_merge!(combine, d::Dict{K,V}, k::K,v::V)
The _merge!
is necessary for types where the pair formation is expensive (e.g. causes an allocation).
(From a recent discourse discussion.)
Consider
In this code, the hash of the key is calculated 3 times. Ideally, this would happen only once.
Tokens provide one way to handle this. Two possible interfaces:
(
gettoken1
andgettoken2
are named as such to distinguish them here. The actual name for whatever API is chosen should begettoken
)I have a slight preference for API 1. API 2 was proposed by @StefanKarpinski here. (@StefanKarpinski, please fix if incorrect!)
Related solutions/issues:
SortedDict
s in DataStructures.jl, to allowO(1)
access to a value that was looked upget!(d, key, default)
to store and return a default value, if the key is not in the dictionaryDefaultDict
s in DataStructures.jl, for a similar purpose?=
as a get-or-set operation (#2108)Ref
s to access elements (#12157)Cc: @StephenVavasis @andyferris