JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
23 stars 9 forks source link

remotecall_fetch vulnerable to hash collisions? #48

Open nalimilan opened 6 years ago

nalimilan commented 6 years ago

I've bumped into this behavior by accident after defining hash incorrectly for arrays. I'm absolutely not sure it indicates a bug, but I figured I'd better report it since this kind of exceptional situation is rarely tested.

It appears that when hash collisions happen, remotecall_fetch is not able to detect that the contents of an array have been updated. This can easily be reproduced by always returning the same hash for all arrays. This is surprising to me, as I would have expected that differing hashes are sufficient, but not necessary, to consider that two arrays are different, i.e. that isequal would always be called to check for hash collisions.

julia> using Distributed

julia> Base.hash(::AbstractArray, ::UInt) = zero(UInt)

julia> x = [1]
1-element Array{Int64,1}:
 1

julia> remotecall_fetch(()->x, 2)
1-element Array{Int64,1}:
 1

julia> x[1] = 2
2

julia> remotecall_fetch(()->x, 2) # Woops, not updated
1-element Array{Int64,1}:
 1
StefanKarpinski commented 6 years ago

I wonder if there's some fallback somewhere from when arrays were hashed by identity because they're mutable. That would manifest like this and should definitely be changed, if so.

andreasnoack commented 6 years ago

It happens here where the logic for moving globals is implemented.

cc: @amitmurthy

amitmurthy commented 6 years ago

That block of code is an optimization to avoid serializing global objects if unchanged. Relevant especially in the case of largish arrays. Short of removing the optimization and serializing all referenced globals (except consts) each time, I am at a loss how to fix this - we cannot compare against the previous state of the object.

nalimilan commented 6 years ago

Would using object_id make sense here?

amitmurthy commented 6 years ago

No, object_id of x is unchanged in your example above - it is the same object. The global is serialized if either object_id or the hash value of the binding changes between remotecall invocations.

nalimilan commented 6 years ago

Can we do something about this? It really sounds like a recipe to create very rare bugs that will be terribly hard to reproduce for users.

Moreover, if we merge JuliaLang/julia#26022, collisions will become quite likely if you change just a few entries in an array. Cc: @mbauman