Open crusaderky opened 2 months ago
Ok, found the difference. the deep-copy is NOT tripped on the transfer of the task output from a to b; it's the random seed that's sent from the client to scheduler. :facepalm:
This issue applies to all embedded variables that are sent from client to scheduler, e.g.
x = client.submit(lambda x: x + 1, np.random.random(1024), key="x", workers=[a])
When the buffer reaches
distributed.protocol.serialize.pickle_loads
,buffers[0]
is a bytes object. This causes pickle_loads to deep-copy the buffer in order to honour the writeable flag of the original.To verify, add at the top of
pickle_loads
:What's causing me a migraine is:
with
then the numpy object is no longer deserialized by
distributed.protocol.serialize.pickle_loads
, but it's instead processed bydistributed.protocol.numpy.deserialize_numpy_array
, which receives a writeable bufferthen we are using again
distributed.protocol.serialize.pickle_loads
, which receives a read-only buffer but this time the writeable flag is False so no deep copy happens.