Closed piever closed 2 years ago
This is a good question. I found the Base
interface too fast-and-loose which is why I made it "tighter" here.
Is the conversion step causing some issue for you? Is the concern performance? I'm just trying to understand what problem we are trying to solve.
Is the conversion step causing some issue for you? Is the concern performance? I'm just trying to understand what problem we are trying to solve.
The main problem for me is that the conversion step can error (depending on the type of the key). The performance cost of convert
is in my use case not an issue, but it is problematic that it can throw method errors.
I essentially would like the following to hold:
get(dict, key, default) == key in keys(dict) ? dict[key] : default
where the expression on the left is faster.
The problem is that in my use case, key
is passed by the user, and could in principle be of any type, so for this to work I need to use Dictionary{Any, V}
(not sure how else to implement an optimized version of the right-hand side that works for any key
).
Another issue IMO is that this error is a bit surprising: I was not expecting the behavior of a non-mutating function to depend on the key type of the container (this of course is subjective).
OK this is really interesting.
as it only relies on the
hash
This really depends on the dictionary type. We have really simple array-based dictionaries that only use isequal
, we have hash-based dictionaries using hash
as well as isequal
, and we need to support sort-based dictionaries that use isless
as well as isequal
. And potentially something more exotic we haven't consdiered yet. Unfortunately, because of the response in https://github.com/JuliaLang/julia/pull/40031 (and many similar issues like https://github.com/JuliaLang/julia/issues/34815) we know that isless
will tend to error out for two arbitrary types, thus we cannot claim to support operations like get(::AbstractDictionary{K, T}, ::Any, ::Any)
without the possibility of error.
The problem is that in my use case, key is passed by the user, and could in principle be of any type
Yeah. If we were to put a try
around the conversion it can cause performance problems (there is an overhead).
I'm not sure if you are talking about application users or library users, but if it is the former I would suggest sanitizing the input (doing the convert
and reporting the problem cleanly if it fails), and if it is the latter you might consider doing conversion inside a try
block there if you aren't performance sensitive here. But I don't understand your use case - it could be helpful if you are willing to explain this a little more?
Another issue IMO is that this error is a bit surprising: I was not expecting the behavior of a non-mutating function to depend on the key type of the container (this of course is subjective).
Yes, I can sympathize with this point of view. To be honest, I think that isless
has this surprise too, and I don't know how to avoid the consequences of that. But more concretely, I think of arrays and dictionaries as similar (and AbstractDictionary
sits in between AbstractDict
and AbstractArray
in design space) - and you can't do operations like get([], nothing, 1)
in Julia. The type of the key in get
has to make sense (and in my experience if not it seems likely to be the result of a bug).
I think I understand the concerns a bit better now, but I'm not sure they are insurmountable. (Sorry in advance if the reasoning below sounds too assertive! It's just a proposal, with a possible implementation in #93: I completely understand if it turns out it's not feasible because it is not in line with the philosophy of the package or has deeper flaws I'm not seeing.)
We know that isless will tend to error out for two arbitrary types, thus we cannot claim to support operations like
get(::AbstractDictionary{K, T}, ::Any, ::Any)
without the possibility of error.
Still, at the moment, the signature is get(d::AbstractDictionary, ::Any, ::Any)
, and it does error at runtime. There is the implicit assumption that the key is "convertible" to the keytype, ie "convert(K, key)
works", but conceptually that is not really different from "gettoken(d, key)
works". In that case, why make gettoken(d, key)
fail more often than needed?
In the case of isless
-based dictionaries, I imagine an isless
method error is just as informative as a convert
method error, whereas for other dictionaries it's nice that things just work. I see that there is some fuzziness here, in that it's not obvious to me whether one should convert
before calling isless
or not. I suspect one shouldn't, because, eg, missing
can be compared to integers, but not converted to integers. I think it's fair to add to the documentation of the isless
dictionary that keys used (either in get
or getindex
) must be comparable. One should probably also document that conversion of the key to the keytype shouldn't change the result of the comparison, just as it should not change the result of isequal
for hash
-based dictionaries.
But I don't understand your use case - it could be helpful if you are willing to explain this a little more?
Sure! This came up while I was implementing a "cycler with defaults" for AlgebraOfGraphics. It takes a list of keys and a list of values as well as a list of defaults, and an iterable. During the iteration, if the element is among the keys, the cycler returns the corresponding value, otherwise it cycles through the defaults. The issue stems from the fact that, if iter
is heterogeneous, keys
could easily have a stricter type. Example implementation:
julia> using Dictionaries
julia> function cycle(keys, values, defaults, iter)
idx = Ref(0)
N = length(defaults)
dict = Dictionary(keys, values)
return [get(() -> defaults[mod1(idx[] += 1, N)], dict, el) for el in iter]
end
With the following outcome
julia> cycle(["a", "b"], [1, 2], [-1, 7, 12], ["a", "b", "c", "d", "a"])
5-element Vector{Int64}:
1
2
-1
7
1
julia> cycle(["a", "b"], [1, 2], [-1, 7, 12], ["a", "b", "c", "d", missing])
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type String
I understand that here I can just use Base.Dict
and it will work, but I would prefer it if Dictionary
was a full replacement (don't like the idea of having both Dict
and Dictionary
in the code base). I also understand that, if the eltype of iter
is known, I can do some type juggling:
julia> function cycle(keys, values, defaults, iter)
idx = Ref(0)
N = length(defaults)
K = Base.promote_typejoin(eltype(keys), eltype(iter))
V = eltype(values)
dict = Dictionary{K, V}(keys, values)
return [get(() -> defaults[mod1(idx[] += 1, N)], dict, el) for el in iter]
end
cycle (generic function with 1 method)
julia> cycle(["a", "b"], [1, 2], [-1, 7, 12], ["a", "b", "c", "d", missing])
5-element Vector{Int64}:
1
2
-1
7
12
While this is probably possible in my use case, my vote would be in favor of getting things to work regardless of the type parameter.
@piever so I think the best summary of the issue is actually this situation:
julia> 1 in Set{String}()
false
julia> 1 in Indices{String}()
ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type String
Closest candidates are:
convert(::Type{String}, ::String) at essentials.jl:210
convert(::Type{T}, ::T) where T<:AbstractString at strings/basic.jl:231
convert(::Type{T}, ::AbstractString) where T<:AbstractString at strings/basic.jl:232
...
Stacktrace:
[1] in(i::Int64, indices::Indices{String})
@ Dictionaries ~/.julia/dev/Dictionaries/src/AbstractIndices.jl:62
[2] top-level scope
@ REPL[3]:1
In favour of status quo:
in
or haskey
always returns false
because I'm looking for the wrong key - oops. This stuff is hard to detect. As something that has happened to me, sometimes there migth be confusion between where I'm using String
and where I'm using Symbol
(think JSON3, where the type of the dictionary key is confusing).In favour of change:
Base
.map
, filter
, etc. (This can even cause a different kind of problem in production at run-time - your test cases might work in the typical case but throw an error when some data of a particular type is absent :scream:).Thanks for a very clear summary! I agree that this is a judgement call, given those bullet points.
One thing I would like to add (in favor of change) is that it's possible to implement the "strict" version from the "loose" version, ie
module Strict
get(dict, key, default) = Base.get(dict, convert(keytype(dict), key), default)
end
(same goes for haskey
and company). So one could potentially use this in production code, getting the same guarantees also when using Base containers. OTOH, I think the converse is not possible: given the strict version, I could not figure out how to implement the loose version using only public APIs.
From what I understand, julia Base forces conversion of the key to the key-type of the
Dict
for mutating methods, egget!(dict, key, default)
, but does not do so for non-mutating methods. Comparewith
Could
get(d::AbstractDictionary, key, default)
andget(f::Callable, dict::AbstractDictionary, default)
avoid the conversion step? It seems to me thatgettoken
could also relax this signature as it only relies on thehash
, which should not be changed upon conversion, that is to say, if!isequal(key, convert(K, key))
thenkey
is invalid forAbstractDictionary{K, V}
.