cloudpipe / cloudpickle

Extended pickling support for Python objects
Other
1.64k stars 167 forks source link

Cyclic references not properly handled on load #478

Closed kdkavanagh closed 1 year ago

kdkavanagh commented 2 years ago

Class instances contained by sets elsewhere seem to be unable to be unpickled, as their hash function is invoked (presumably by the set insertion) before the object is fully unpickled. This example is contrived, a more complex usage might be a graph data structure where a child maintains a reference to a parent while the parent maintains a set of all its children.

SET = set()
class ReferencesSet:
    def __init__(self):
        # Maintain a cyclic reference
        self._set=SET
        self._data = "asdf"
        SET.add(self)

    def __hash__(self):
        return hash(self._data)

REF = ReferencesSet()
>>> import cloudpickle
>>> cloudpickle.loads(cloudpickle.dumps(REF))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...", line 72, in __hash__
    return hash(self._data)
AttributeError: 'ReferencesSet' object has no attribute '_data'
pierreglaser commented 1 year ago

Hi, thanks for the report. This looks like a pickle issue, and not a cloudpickle one: pickle.loads(pickle.dumps(REF)) yields an error too.

As you accurately mentionned, the problem is the fact that when unpickling a set, each of the set's object's __hash__ method is called, causing errors when the object is being unpickled and thus is only partially created. This could probably be fixed in Python upstream. I don't see an valid reason why a set should call __hash__ on its objects while being created.

pierreglaser commented 1 year ago

I'm closing this, but it'd be good to file an issue (and cross-post this one) in python/cpython