MagicStack / immutables

A high-performance immutable mapping type for Python.
Other
1.14k stars 57 forks source link

How to efficiently track and store deltas between two HAMTs #75

Open aeftimia opened 2 years ago

aeftimia commented 2 years ago

Ideally, I would like to be able to do

x = Map({'a': 2, 'c': 1})
y = x.update({'b': 3: 'c': 2)
z = y - x # magic
z == Map({'b': 3, 'c': 2})

Is there any particularly efficient way to do this in terms of memory and computational time? Ideally, I'd like z to share its data with y in the same way y shares its data with x. One way that comes to mind is

def diff(y, x):
  z = y
  for k, v in y.items():
    if k in x and x[k] == v:
      z = z.delete('k')
  return z

But this is O(N log N) (for log N get/set). Is there a more efficient way to go about this?

sizur commented 1 year ago

This is not a HAMT specific problem. Depending on your use-case, you'd have to either pay a large computation during difference operation, or small computation for every single (or batch) change operation:

Option 1: Pretty much what you wrote, module your requirements.

Option 2: A wrapper class that appends every single set, delete, commit operation into a linear history list, enabling you to quickly get mutation differences as history index tuple. Differences between divergent histories from a common ancestor can be done efficiently by finding a largest common prefix of the two histories, which can be done with binary search thanks to cached hashes. But this option will not work if the two maps will not have a common ancestor map.