JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.68k stars 5.48k forks source link

StackOverflow in hash for circular references #46725

Open bkamins opened 2 years ago

bkamins commented 2 years ago

Example problem:

julia> x = []
Any[]

julia> push!(x, x)
1-element Vector{Any}:
 1-element Vector{Any}:#= circular reference @-1 =#

julia> hash(x)
ERROR: StackOverflowError:
vtjnash commented 2 years ago

It is unclear to me that this hash value exists, since you also can't determine if two self-referential arrays are equivalent either:

julia> x = []
Any[]

julia> push!(x, x)
1-element Vector{Any}:
 1-element Vector{Any}:#= circular reference @-1 =#

julia> y = []
Any[]

julia> push!(y, y)
1-element Vector{Any}:
 1-element Vector{Any}:#= circular reference @-1 =#

julia> x == y
ERROR: StackOverflowError:
Stacktrace:
 [1] ==(A::Vector{Any}, B::Vector{Any}) (repeats 20134 times)
   @ Base ./abstractarray.jl:2882
 [2] top-level scope
   @ REPL[5]:1

julia> x == x
ERROR: StackOverflowError:
Stacktrace:
 [1] ==(A::Vector{Any}, B::Vector{Any}) (repeats 20134 times)
   @ Base ./abstractarray.jl:2882
 [2] top-level scope
   @ REPL[6]:1

julia> isequal(x, x)
true

julia> isequal(x, y)
ERROR: StackOverflowError:
Stacktrace:
 [1] _zip_iterate_some
   @ ./essentials.jl:0 [inlined]
 [2] _zip_iterate_all
   @ ./iterators.jl:412 [inlined]
 [3] iterate
   @ ./iterators.jl:402 [inlined]
 [4] isequal(A::Vector{Any}, B::Vector{Any})
   @ Base ./abstractarray.jl:2852
 [5] isequal(A::Vector{Any}, B::Vector{Any}) (repeats 21811 times)
   @ Base ./abstractarray.jl:2853
 [6] top-level scope
   @ REPL[8]:1
bkamins commented 2 years ago

Indeed == and isequal do not work either. It is not a huge problem so probably it can be left out, but I stumbled on it when defining some recursive data structure, so I reported it.

Theoretically equality can be defined for a recursive data structure (just like printing is defined for such structures), but probably it would be a complexity overkill so it is not useful (as you would have to introduce circular reference detection). I preferred to report it so that it can be consciously decided if this issue should be closed.

gbaraldi commented 2 years ago

Python does work with this, I wonder how expensive the circular reference detection would be.

bkamins commented 2 years ago

Printing stores all visited objects and stops when some object is seen again. But I am afraid this is expensive (I have not thought about it much though).

gbaraldi commented 2 years ago

Doing some very quick testing on a simpleish method of detecting a circular reference on hash, i.e

if length(A) < 8192
    for x in A
        if objectid(x) == objectid(A)
            h = hash(objectid(x), h)
        else
            h = hash(x, h)
        end
    end
    return h
end

It seems to be twice as slow. And that is with a vector of the circular reference and a bunch of zeros, I imagine if the other elements were more complicated it wouldn't be as bad. For bigger arrays it seems to be basically the same.