DanielStutzbach / blist

A list-like type with better asymptotic performance and similar performance on small lists
Other
310 stars 36 forks source link

bug in _sorteddict.KeysView #41

Closed pkch closed 11 years ago

pkch commented 12 years ago

Example:

import blist
d = blist.sorteddict((1,1),(2,2))
e = blist.sorteddict((1,1),(3,3))
d.keys() & e.keys()

fails with AttributeError. The culprit is the following method in _sorteddict.KeysView:

def _from_iterable(cls, it):
    return sortedset(key=self._mapping._sortedkeys.key)

First, this method is probably meant to be @classmethod. Second, the call to sortedset needs it as the first argument. Third, _mapping is only available from an instance, which we don't have here.

After the (trivial) fix to address the first two issues, the code I showed above works fine.

However, if sorteddict instance uses a custom sort key, it is lost. I don't know how to fix this. Possibly it shouldn't be fixed; what sort key should be used if the two sorteddict have different sort keys? Using the one from the LHS is quite arbitrary, even if we could retrieve it. I really don't know what the right approach is here.

DanielStutzbach commented 12 years ago

Thanks for finding this problem and digging into it. Python allows a classmethod to be overridden with an instance method. Would the following work?

def _from_iterable(self, it):
  return sortedset(it, key=self._mapping.sortedkeys.key)

I agree that it's not clear what the caller expects when mixing sort keys. Probably it should raise an exception? In that case, it would be necessary to override __or__ etc. and then _from_iterable could be removed entirely.

pkch commented 12 years ago

I would not like an override for 2 reasons.

  1. _from_iterable used frequently by collections.ABC, and so changing its semantics from needing just a class to needing an instance feels inappropriate.
  2. you wouldn't know when the LHS and RHS sortedsets don't have the same key. In that case, using LHS key is really bad since it breaks the expected symmetry of and / or .

An alternative that doesn't suffer from these problems is to make _from_iterable @classmethod, and use default key. Unfortunately, it results in another unpleasant outcome: two sorted sets with the same key result in a new set with the default key. Very unexpected and annoying to users.

So I like your second proposal the most: remove _from_iterable entirely, and override and/or/etc. directly. While doing that, raise exception if the keys for LHS and RHS are not the same. Apart from a bit extra code, I see one major issue: how can you tell if key1 and key2 are the same?

x = sortedset((1,2,3), key = lambda x : -x)
y = sortedset((3,4,5), key = lambda x : -x)
print(x & y) # raises exception since lambda x: -x != lambda x: -x

If no good answer to that is available, perhaps just let this be a limitation of blist.sortedset - not a dealbreaker, IMO.

I'm asking a question about it on SO: http://stackoverflow.com/questions/9963155/comparing-functions

If this can be done, ideally it should be done at construction (i.e., merging identical key functions), to avoid wasting time every time someone needs to do a set operation. But unfortunately, unlike lambdas, regular functions may change (I believe) if someone does crazy stuff and modifies the function's source code. So I suppose I'd recommend assuming that different named functions are certainly different, regardless of the code in them; and lambdas could actually be checked for equality by reviewing the source code at construction.

Max

On Sat, Mar 31, 2012 at 11:29 AM, Daniel Stutzbach < reply@reply.github.com

wrote:

Thanks for finding this problem and digging into it. Python allows a classmethod to be overridden with an instance method. Would the following work?

def _from_iterable(self, it): return sortedset(it, key=self._mapping.sortedkeys.key)

I agree that it's not clear what the caller expects when mixing sort keys. Probably it should raise an exception? In that case, it would be necessary to override or etc. and then _from_iterable could be removed entirely.


Reply to this email directly or view it on GitHub: https://github.com/DanielStutzbach/blist/issues/41#issuecomment-4860531

DanielStutzbach commented 12 years ago

Checking if two functions have identical functionality is undecidable. It'd be safest and easiest to just check if the two key functions are the same function using the "is" operator.

pkch commented 12 years ago

Well, it's undecidable in general, but it can be easily decided for the cases that you would care about. I think if the key argument is given by a named function, the standard "is" operator is sufficient; even if two different named functions happen to define the same order, the programmer thought of them as separate functions, and it's fair to raise an exception if the programmer decides to calculate a union of two sortedsets with such keys:

def f(x):
  return x['id']

def g(x):
  return x['id']

s1 = blist.sortedset(key = f)
# ...
s2 = blist.sortedset(key = g)
# ...
s = s1 & s2
# probably a bug: if s1 and s2 were intended to be compatible, 
# the programmer would likely have used function f as the key for both

The only time I might recommend doing anything other than "is" operator is for lambda functions:

s1 = blist.sortedset(lambda x : x['id'])
#...
s2 = blist.sortedset(lambda x : x['id'])
# ...
s = s1 & s2
# probably not a bug; the programmer might well have meant
# the two sorted sets to be compatible
# note that Python would report them as not equal

Finally, the sort key function could be a callable function, rather than a function; e.g., itemgetter('id'). I would hope that for a callable object, the author defined a meaningful equality operator, if appropriate (otherwise the operator is would be fine). (For some reason, itemgetter's equality defaults to is; that might be just a minor oversight.)

For the time being, I wouldn't worry about that, and just like you suggest, use == for everything (which would default to is for all functions).

Max

On Apr 1, 2012 8:38 AM, "Daniel Stutzbach" < reply@reply.github.com> wrote:

Checking if two functions have identical functionality is undecidable. It'd be safest and easiest to just check if the two key functions are the same function using the "is" operator.


Reply to this email directly or view it on GitHub: https://github.com/DanielStutzbach/blist/issues/41#issuecomment-4867040