Marco-Sulla / python-frozendict

A simple immutable dictionary for Python
https://github.com/Marco-Sulla/python-frozendict
GNU Lesser General Public License v3.0
133 stars 16 forks source link

[FEATURE] Add other `set` operations #2

Closed Marco-Sulla closed 4 years ago

Marco-Sulla commented 4 years ago

Legenda: fd: a frozendict d: a dict-like object it: an iterable

frozendict could implement also:

  1. fd <= d: check if the two fd and d are equal, or if fd has the same items of d (subset).
  2. fd < d: check if the two fd and d are not equal, and if fd has the same items of d (proper subset).
  3. fd >= d, fd1 > d: the same concept above, inverted.
  4. fd & it: returns a new frozendict, that is fd with only the keys present in it (intersection, or join)
  5. fd ^ it: returns a new frozendict without the keys in common (xor)
  6. fd.isdisjoint(d): syntactic sugar for bool(fd & d)
lurch commented 4 years ago

I'm not sure if adding "set" operations to a "mapping" class makes sense? For things like <= and < it might cause user-confusion about whether it's comparing keys, values, or keys-and-values? :man_shrugging:

Maybe it depends how clear the explanatory documentation is? :wink:

Marco-Sulla commented 4 years ago

PEP 584 takes in consideration the full API of set, even if it delegate them to other PEPs.

The question you pose is intelligent and not trivial. We should check keys or items? I think we can remove easily the option to see only the values :)

I think the question we have to answer is: what is more useful?

A simple example:

a = frozendict({"Marco": "Sulla", "Andrew": "Scheller"})
b = frozendict({"Andrew": "Scheller"})
c = frozendict({"Andrew": "Kaufman"})

x = b < a
y = c < a

If we choose to check only the keys, x and y are both True. If we choose to check the items, only x is True.

IMHO, it's quite more interesting for a developer to know if the entire item is contained in the other frozendict.

PS: you can skip this part, it will be very verbose... IMHO it's quite natural to think maps as a sort of set. Keys are evidently a set, and one can consider also the couple key-value as element of a set, since it's inevitably unique. On the contrary, you can think a py set as a sort of dict, where the keys are the hashes of the values themselves. I bet the implementation of set is something like that. Furthermore, dict views already implements some set operation, And if you read the implementation of dict in CPython, in a comment Van Rossum express its hope that the dict views will implement more set operators.

lurch commented 4 years ago

I agree with what you wrote (i.e. that it should check the whole item, not just the key), thanks for answering :+1:

Perhaps my concern was because I'm used to reading < as "less than" and not reading it as "subset of" ? :wink:

Marco-Sulla commented 4 years ago

Added support for intersection with https://github.com/Marco-Sulla/python-frozendict/commit/b18fd62ccdab87687a3d53330a5a3b0e43fd5179

lurch commented 4 years ago

https://www.python.org/dev/peps/pep-0584/#what-about-the-full-set-api talks about the problem with using & on two dictionaries and what to set the values to. Could you add a docstring indicating which of the options you decided to go with? :smiley: (I had a look at the code in your commit but the big and little stuff seems confusing)

Marco-Sulla commented 4 years ago

Well, it's simple IMHO: it's `and:

>>> 5 and 7
7

In or, the first True value wins, in and wins the last one.

So &, if the operands are both map-like, creates a new frozendict with the keys in common and the values of the second operand. I'll add this to the docstring.

PS: little and big is a simple trick to have a quicker for loop ;-)

lurch commented 4 years ago

So &, if the operands are both map-like, creates a new frozendict with the keys in common and the values of the second operand. I'll add this to the docstring.

Are there any tests that check this behaviour too?

PS: little and big is a simple trick to have a quicker for loop ;-)

Ahhh. It also means that the items get re-ordered in the returned frozendict though, based upon which of the two input-dicts is longest? I dunno if this matters or not though :man_shrugging:

Marco-Sulla commented 4 years ago

So &, if the operands are both map-like, creates a new frozendict with the keys in common and the values of the second operand. I'll add this to the docstring. Are there any tests that check this behaviour too?

No. Could you open a enhancement request about this?

Ahhh. It also means that the items get re-ordered in the returned frozendict though, based upon which of the two input-dicts is longest? I dunno if this matters or not though man_shrugging

I suppose no.

  1. this only can happen if the second operand is a dict-like object longer than the first operand, or a not-dict-like iterable shorter than the first operand, and with keys in common not in the same order. I don't think it's a common case.
  2. even if the keys are ordered in a different manner, the frozendicts are the same
  3. dict assure that the order insertion are preserved. This is not an insertion, is a intersection operation.
  4. I can't think about a case in which this can be a problem
lurch commented 4 years ago

Could you open a enhancement request about this?

15

I can't think about a case in which this can be a problem

Neither can I, but I thought I'd mention it just in case :wink:

Marco-Sulla commented 4 years ago

Implemented isdisjoint() with https://github.com/Marco-Sulla/python-frozendict/commit/cea7968193214978c1e6dcb0fe2d1f22071cce69

Marco-Sulla commented 4 years ago

Ahhh. It also means that the items get re-ordered in the returned frozendict though, based upon which of the two input-dicts is longest? I dunno if this matters or not though man_shrugging

I thought a little more about this. I checked a similar implementation in CPython: dict items, where all values are hashable.

>>> {1:2, 3:4, 5:6}.items() & {3:4, 1:2}.items()
{(1, 2), (3, 4)}
>>> {1:2, 3:4, 5:6}.items() & {(1, 1)}
set()
>>> {1:2, 3:4, 5:6}.items() & (1, 3)
set()
>>> {1:2, 3:4, 5:6}.items() & ((1, 2), )
{(1, 2)}

So, currently:

  1. items does not support operations with a generic iterable
  2. items elements are considered part of the intersection only if the entire element is equal to the element of the other object, and not only the key
  3. the original order of the first operand is preserved

I think that:

  1. supporting also something like {1:3, 5:7, 3:4} & (1, 5) == {1:3, 5:7} is quite handy. So I will not remove it.
  2. The items implementation is right here. A map can be considered the set of the pairs key-value. I considered only the keys to make the operation between a frozendict and a dict-like object more similar to the case in which the other operator is a generic iterable. But now I think this is not good. The problem is: I have to act like items, so apply the intersection only if also the value is hashable, or support also frozendict with mutable values?
  3. It's better to follow the standard. I think I can optimize the algorithm anyway. Furthermore, if you want to sort the frozendict in another way, there's a sorted method now :-)
Marco-Sulla commented 4 years ago

Okey, another errata corrige:

>>> {3:4, 5:6, 1:2, 7:8}.items() & {1:2, 5:6, 3:4}.items()
{(1, 2), (5, 6), (3, 4)}
>>> {3:4, 5:6, 1:2, 7:8}.items() & {5:6, 1:2, 3:4}.items()
{(1, 2), (5, 6), (3, 4)}
>>> {7:8, 5:6, 3:4, 1:2}.items() & {5:6, 1:2, 3:4}.items()
{(1, 2), (5, 6), (3, 4)}

Python does not preserve at all the order of the inserted keys. Indeed the result is a set, that is unsorted by definition.

But anyway, on a third thought :-D, I like the fact that and wins, and you could change the order of the original frozendict.

So I implemented point 2, but only for dict-like objects. Iterable of pairs are not handled differently, but follows my implementation of point 1. This is IMHO more consistent with __sub__(). What if I have a iterator of tuples with two items as second operator? Must I consider them keys or key-value pairs? And what if a frozendict has tuple with two elements as keys? I think that the behavior to consider iterables of pairs as key-value items is useful and consistent only when used by the constructor. For point 3, I decided that the order is decided by the order of the second operand.

Fixed: https://github.com/Marco-Sulla/python-frozendict/compare/5c6b428..c849344

Marco-Sulla commented 4 years ago

Changes to isdisjoint and __sub__, accordingly to the new __and__ behavior: https://github.com/Marco-Sulla/python-frozendict/compare/18732e63..cca4c6a6

Marco-Sulla commented 4 years ago

Closed per https://github.com/Marco-Sulla/python-frozendict/issues/17