Open bkamins opened 2 years ago
It is unclear to me that this hash
value exists, since you also can't determine if two self-referential arrays are equivalent either:
julia> x = []
Any[]
julia> push!(x, x)
1-element Vector{Any}:
1-element Vector{Any}:#= circular reference @-1 =#
julia> y = []
Any[]
julia> push!(y, y)
1-element Vector{Any}:
1-element Vector{Any}:#= circular reference @-1 =#
julia> x == y
ERROR: StackOverflowError:
Stacktrace:
[1] ==(A::Vector{Any}, B::Vector{Any}) (repeats 20134 times)
@ Base ./abstractarray.jl:2882
[2] top-level scope
@ REPL[5]:1
julia> x == x
ERROR: StackOverflowError:
Stacktrace:
[1] ==(A::Vector{Any}, B::Vector{Any}) (repeats 20134 times)
@ Base ./abstractarray.jl:2882
[2] top-level scope
@ REPL[6]:1
julia> isequal(x, x)
true
julia> isequal(x, y)
ERROR: StackOverflowError:
Stacktrace:
[1] _zip_iterate_some
@ ./essentials.jl:0 [inlined]
[2] _zip_iterate_all
@ ./iterators.jl:412 [inlined]
[3] iterate
@ ./iterators.jl:402 [inlined]
[4] isequal(A::Vector{Any}, B::Vector{Any})
@ Base ./abstractarray.jl:2852
[5] isequal(A::Vector{Any}, B::Vector{Any}) (repeats 21811 times)
@ Base ./abstractarray.jl:2853
[6] top-level scope
@ REPL[8]:1
Indeed ==
and isequal
do not work either.
It is not a huge problem so probably it can be left out, but I stumbled on it when defining some recursive data structure, so I reported it.
Theoretically equality can be defined for a recursive data structure (just like printing is defined for such structures), but probably it would be a complexity overkill so it is not useful (as you would have to introduce circular reference detection). I preferred to report it so that it can be consciously decided if this issue should be closed.
Python does work with this, I wonder how expensive the circular reference detection would be.
Printing stores all visited objects and stops when some object is seen again. But I am afraid this is expensive (I have not thought about it much though).
Doing some very quick testing on a simpleish method of detecting a circular reference on hash, i.e
if length(A) < 8192
for x in A
if objectid(x) == objectid(A)
h = hash(objectid(x), h)
else
h = hash(x, h)
end
end
return h
end
It seems to be twice as slow. And that is with a vector of the circular reference and a bunch of zeros, I imagine if the other elements were more complicated it wouldn't be as bad. For bigger arrays it seems to be basically the same.
Example problem: