Make Python hashing / identity traits for `Formula`, `Variable`, `Expression` more strict

EricCousineau-TRI commented 6 years ago

Relates #8315

At present, we overload __hash__ and __eq__ - returning Formula, a non-bool value - for Variable and Expression. With the implicit behavior of __nonzero__ for Formula, a non-null Formula will always return true, meaning that some of Python's identity mechanisms will produce unexpected results.

The main issue is that comparison in some unittests may not be fully checked, as unittest's assertEqual for two objects may be comparing by __eq__ and not __hash__, since it will check a == b, and then return (a == b).__nonzero__(), which is not the check that the author would intend.

We could throw on __nonzero__ for Formula; however, that will prevent Variable, Expression, etc. from being used in containers like dict, because, in CPython (Python 2.7.12) it does a comparison with __eq__ after finding the item according to its hash (which I guess is to avoid hash collisions): https://github.com/python/cpython/blob/1fae982b9b6fff5a987a69856b91339e5d023838/Objects/dictobject.c#L342

Potential solutions:

Disable Formula.__nonzero__, and require users use a wrapping dictionary that has keys which define __eq__ to compare based on the object's hash.
Adopt sympys style: ensure that == and != return boolean value comparison, and have other operators (< <= > >=) return Formulas
Do nothing and try to ensure that we hand curate any existing symbolic tests

(I had tried to see if defining __cmp__ could buy us something, but then re-read the docs which states that rich comparison (__eq__, __lt, etc.) will take precedence over `cmp__` if they are defined.)

My vote is for (1), as it is relatively simple, and we can massage the interface to pybind11 to make it relatively seamless. Aside from this caveat in dict, I'm not sure of any other situations where this will affect us. (2) doesn't buy us much of anything w.r.t. #8315, as we still need other logical operators to play well in this framework. (3) would not catch future errors.

@soonho-tri Can I ask what your opinion is on this, and if there are other solutions that you can think of?

soonho-tri commented 6 years ago

I think (1) is the best among the options. We also want to remove the explicit bool conversion of symbolic::Formula from the C++ side which is analogous to this proposal on the Python side.

EricCousineau-TRI commented 6 years ago

Hup, forgot to link issues: Relates #8417

RobotLocomotion / drake

Make Python hashing / identity traits for `Formula`, `Variable`, `Expression` more strict #8491