TurboGears / Ming

MongoDB ODM (Object Document Mapper) with Unit of Works, IdentityMap, Relations and Mongo-In-Memory implementation
https://ming.readthedocs.io
Other
28 stars 17 forks source link

Copy On Write for the object state #47

Open CastixGitHub opened 2 years ago

CastixGitHub commented 2 years ago

answer to https://github.com/TurboGears/Ming/pull/45#issuecomment-1133466260

I implemented it using a high level approach, should be good enough to be used with pymongo dictionaries, but I didn't think about supporting other kind of objects (that would lead again to deepcopy, unless we are able to drop the abstractions and go low level, but I didn't find anything about copy on write on the python ecosystem (this is an interesting read but unrelated) so i don't know how to proceed in that direction, i'm not a kernel developer)

Proof Of Concept

from collections.abc import Mapping, ValuesView

class CopyOnWrite(Mapping):
    def __init__(self, data):
        self.__dict__['_diff'] = {}
        self.__dict__['_data'] = {}
        for k, v in data.items():
            self._data[k] = v if not isinstance(v, dict) else self.__class__(v)

    def __getitem__(self, key):
        if key in self._diff:
            return self._diff[key]
        return self._data[key]

    def __setitem__(self, key, value):
        if not isinstance(value, dict):
            self._diff[key] = value
        else:
            self._diff[key] = self.__class__(value)

    __setattr__ = __setitem__
    __getattr__ = __getitem__

    @property
    def original(self):
        res = {}
        for k, v in self._data.items():
            if isinstance(v, self.__class__):
                res[k] = v.original
            else:
                res[k] = v
        return res

    def __len__(self):
        return len(self._data | self._diff)

    def __iter__(self):
        return iter(self._data | self._diff)

    def values(self):
        # TODO: unsure about this, docs says a view should reflect the changes
        # TODO: does this work at deeper levels?
        values = self._data | self._diff
        for k, v in values.items():
            if isinstance(v, self.__class__):
                values[k] = v._data | v._diff
        return ValuesView(values)

if __name__ == '__main__':
    original = {
        'a': 'A',
        'd': {'z': 'Z'},
        'l': [1, 2, 3],
    }
    cow = CopyOnWrite(original)

    assert cow.original == original, cow.original

    cow.a = 'AA'
    assert cow.original == original, cow.original
    assert cow.a == 'AA'
    assert cow['a'] == 'AA'

    cow.d.x = 'X'
    cow.d.z = 'ZZZ'
    assert cow.original == original, cow.original
    assert cow.d.x == 'X'
    assert cow.d.z == 'ZZZ'

    cow.l.append(4)
    assert cow.original == original, cow.original
    assert cow.l[-1] == 4

    cow.n = 'new'
    assert cow.original == original, cow.original

    assert len(cow) == 4, len(cow)

    for i, n in enumerate(cow):
        assert n == ('a', 'd', 'l', 'n')[i]

    assert list(cow.keys()) == list(original.keys()) + ['n']

    assert list(cow.values()) == ['AA', {'x': 'X', 'z': 'ZZZ'}, [1, 2, 3, 4], 'new']

    assert dict(cow.items()) == dict(zip(list(original.keys()) + ['n'],
                                         ['AA', {'x': 'X', 'z': 'ZZZ'}, [1, 2, 3, 4], 'new']))

How to apply to the current Object/ObjectState/Tracker

I need to check.

brondsem commented 2 years ago

So at a high-level I know what copy-on-write means, but to be honest I'm not real sure how to evaluate this. You have a lot of test cases, which is good, but still its pretty complex and detailed. I wouldn't know if there's some other cases that need to be considered. Or if this has a performance impact.

To be honest, I'd rather not add this much more complexity into Ming. It seems like a lot of logic and nuance and possible performance impact (just a guess, no idea). I'll post back on the other merge request as well, about how we might proceed or not.