edelooff / sqlalchemy-json

Full-featured JSON type with mutation tracking for SQLAlchemy
http://variable-scope.com/posts/mutation-tracking-in-nested-json-structures-using-sqlalchemy
BSD 2-Clause "Simplified" License
189 stars 34 forks source link

Try fixing pickle load issue #28

Closed mvdbeek closed 3 years ago

mvdbeek commented 3 years ago

We ran into the following traceback while unpickling and object that contained a NestedMutableDict:

In [2]: dataset = pickle.load(open('/tmp/hda.pickle', 'rb'))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-719a86be550f> in <module>
----> 1 dataset = pickle.load(open('/tmp/hda.pickle', 'rb'))
~/src/galaxy/.venv/lib/python3.8/site-packages/sqlalchemy_json/track.py in __setitem__(self, key, value)
    104
    105     def __setitem__(self, key, value):
--> 106         self.changed('__setitem__: %r=%r', key, value)
    107         super(TrackedDict, self).__setitem__(key, self.convert(value, self))
    108
~/src/galaxy/.venv/lib/python3.8/site-packages/sqlalchemy_json/track.py in changed(self, message, *args)
     36             logger.debug('%s: %s', self._repr(), message % args)
     37         logger.debug('%s: changed', self._repr())
---> 38         if self.parent is not None:
     39             self.parent.changed()
     40         elif isinstance(self, Mutable):
AttributeError: 'NestedMutableDict' object has no attribute 'parent'

This works as a quick fix for us. The pickle behavior of not running though __init__ is decribed here

I'm not sure if this is the right fix, any help is appreciated @edelooff.

edelooff commented 3 years ago

Hi @mvdbeek, thanks for the description and proposed solution. I'm not sure moving the instance variable to the class definition is the correct one. It works and doesn't do anything untoward, but it looks like a bug due to class vs instance variable confusion.

I tried playing around with __getstate__ and __setstate__ a little, but that didn't lead to anything either. The documentation does contain a note that provided some direction though:

Note: At unpickling time, some methods like __getattr__(), __getattribute__(), or __setattr__() may be called upon the instance. In case those methods rely on some internal invariant being true, the type should implement __new__() to establish such an invariant, as __init__() is not called when unpickling an instance.

So what I tried locally is replacing the __init__ with a __new__ method, as follows:

    def __new__(cls, *args, **kwds):
        tracked = super().__new__(cls, *args, **kwds)
        tracked.parent = None
        return tracked

I've tried that locally with a few simple tests and everything seems to behave as it should, but if you could give it a try on your more real-world test case, I'd like to hear whether it works for you. For completeness I've included the test script:

import pickle

from sqlalchemy_json import NestedMutableDict

one = NestedMutableDict({"numbers": [1, 2, 3, 4]})
two = NestedMutableDict({"numbers": [5, 6, 7]})
one_reloaded = pickle.loads(pickle.dumps(one))
two_reloaded = pickle.loads(pickle.dumps(two))
assert one == one_reloaded
assert two == two_reloaded

one_reloaded["numbers"].append(5)
assert one_reloaded["numbers"] == [1, 2, 3, 4, 5]
assert one["numbers"] == [1, 2, 3, 4]

assert one_reloaded["numbers"].parent is one_reloaded
assert two_reloaded["numbers"].parent is two_reloaded
assert one_reloaded is not two_reloaded

print("Pickle -> unpickle works")
mvdbeek commented 3 years ago

excellent, yes, that works too!

mvdbeek commented 3 years ago

I've added your proposed fix and added some unit tests going through the examples from the readme.

edelooff commented 3 years ago

Thanks a bunch, especially for the tests! I'll look into actually doing a new release to PyPI based on the recent changes when I find the time for it.