While computing deltas between sequences of Word or Chunk objects, it's come to my attention that these objects do not implement proper value comparison/equality testing on their __eq__() methods.
Here's an example of what I mean by 'equality testing', using the built-in container type list:
And here's Pattern's Sentence object behaving as expected under value and identity comparison:
>>> from copy import copy
>>> sent = Sentence(parse("The elephant sits on the chair"))
>>> sent
Sentence('The/DT/B-NP/O elephant/NN/I-NP/O ... chair/NN/I-NP/I-PNP')
>>> sent is copy(sent) # object identity testing
False
>>> sent == copy(sent) # object value testing
True
Contrast the above with the comparison behavior of Pattern's Word and Chunk objects:
>>> sent # Reusing `sent` from the example above
Sentence('The/DT/B-NP/O elephant/NN/I-NP/O ... chair/NN/I-NP/I-PNP')
>>> word = sent.words[1] # Looking at Word object
>>> word
Word('elephant/NN')
>>> word is copy(word) # identity testing
False # good
>>> word == copy(word) # value testing
False # !!!!! unexpected
>>> chunk = sent.chunks[0]
>>> chunk
Chunk('The elephant/NP')
>>> chunk is copy(chunk) # identity testing
False # good
>>> chunk == copy(chunk) # value testing
False # !!!!! unexpected
This comparison behavior is highly surprising, since the objects in both the Chunk and the Word example are equal in terms of the values that they contain, and this is the kind of information that Python's == operator should reflect (as opposed to the separate is keyword).
I can see that the __eq__() method of both Word and Chunk implements value comparison as identity comparison. Here's the code:
def __eq__(self, other):
if not isinstance(other, Sentence):
return False
return len(self) == len(other) and repr(self) == repr(other)
I'm a big fan of the Pattern object model. However, perhaps it might be worth considering extending the latter value comparison implementation to Word and Chunk?
While computing deltas between sequences of Word or Chunk objects, it's come to my attention that these objects do not implement proper value comparison/equality testing on their
__eq__()
methods.Here's an example of what I mean by 'equality testing', using the built-in container type list:
And here's Pattern's Sentence object behaving as expected under value and identity comparison:
Contrast the above with the comparison behavior of Pattern's Word and Chunk objects:
This comparison behavior is highly surprising, since the objects in both the Chunk and the Word example are equal in terms of the values that they contain, and this is the kind of information that Python's
==
operator should reflect (as opposed to the separateis
keyword).I can see that the
__eq__()
method of both Word and Chunk implements value comparison as identity comparison. Here's the code:By contrast, Sentence does this as:
I'm a big fan of the Pattern object model. However, perhaps it might be worth considering extending the latter value comparison implementation to Word and Chunk?