Open RussTedrake opened 6 years ago
It's because x == [0, 0]
becomes [bool(x[0] == 0), bool(x[1] == 0)]
. Python considers anything but the following items as True
:
False
0
(0, 0.0 but not 2 or -3.1)None
So, in the end, we have [True, True]
. I'll check if there is a workaround for this.
First thing I can do is to provide __nonzero__
for symbolic::Formula
so that the above example throws an exception instead of giving a false value [True, True]
.
I'm not sure I'm 100% understanding the bug report or subsequent discussion here. Russ, are you saying that the desired output should look like this?
[x(0) x(1)]
(x(0) = 0)
[(x(0) = 0) (x(1) = 0)]
@jwnimmer-tri , I think that's what Russ wants.
So we'll have:
Expression
always return a Formula
;__nonzero__
on Formula
throws with a note telling users to call Evaluate
explicitly.I like this.
python relational operators on
Expression
always return aFormula
FYI, this is what we already have as print(x[0] == 0.)
returns x(0) == 0
.
the
__nonzero__
on Formula throws with a note telling users to call Evaluate explicitly.
I'll open a PR shortly. But the thing is that it will not give us what Russ wants for the above print(x==[0.,0.])
example. Instead of returning [(x(0) = 0) (x(1) = 0)]
, it will throw an exception as it can't evaluate x(0) = 0
without knowing the value of x(0)
.
I think there are no good solutions as long as we want to use both of ==
operator over numpy.ndarray
. We need to give up one of the two.
We might introduce a function Equal
and make Equal([a, b], [0, 0])
return [a = 0, b = 0]
as requested. Or;
We can introduce a new class such as FormulaArray
and provide __eq__
(and other relational operations) so that FormulaArray1 == FormulaArray2
works as requested.
I'm happy to learn if there is a better option here.
@ericcousineau-tri says he might know a solution
One thing is pybind11
only uses order in determining overloads - it starts with the first, and keeps on checking until it has a successful overload, thus the order in which you .def
the overloads matters.
However, looking more at the discussion, it may seem to be something else; I'm guessing it's because NumPy's operator==
is taking precedence over the return value (returning a logical array) rather than respecting the actual return type.
I'll see if there's a way around that.
EDIT: Can reproduce the issue per Soonho's comment in a simplified example: https://github.com/EricCousineau-TRI/repro/commit/c6398b55b1af2f7fb9fbd2e4050da5db5c3d2c63
We might introduce a function Equal and make Equal([a, b], [0, 0]) return [a = 0, b = 0] as requested. Or;
I just found that numpy provides the function that I looked for: https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.logic.html#comparison
I tested np.equal
, and still get an output of logical values. The documentation for np.equal
states:
out : ndarray or bool Output array of bools, or a single bool if x1 and x2 are scalars.
:(
There is a possibility of extending a custom ndarray
to not return logicals for logical operations:
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.subclassing.html#basics-subclassing
But that means that (a) we have to cast in pybind11
(not that big of a deal, I'd think) and (b) we'd have to take care that nd.array([<symbolic>...]) == nd.array([<symbolic>...])
does not occur.
:-( Just found the same problem. It has dtype
parameter. Do you think it might save us?
Nay :( Seems to be constrained by ufunc
:
>>> print(np.equal(av, bv, dtype=Custom))
Traceback (most recent call last):
File "numpy_eq.py", line 28, in <module>
print(np.equal(av, bv, dtype=Custom))
TypeError: No loop matching the specified signature and casting
was found for ufunc equal
It seems that your option (2) (perhaps have FormulaArray
subclass ndarray
), coupled with an error on __nonzero__
, is the way to go?
sigh... I'll dig it up this evening.
One other option: From the results of https://github.com/RobotLocomotion/drake/issues/8116, if it is possible to write a custom dtype, and possibly register custom ufuncs, then perhaps there is a chance that we can register a ufunc for __eq__
for symbolic types.
That being said, I'm not sure if that will affect the behavior of np.equal
if it's default dtype
is bool
.
Will update this post with some other options:
Tracing up where I think ndarray.__eq__
comes from:
array_richcompare
uses n_opt.equal
PyArray_SetNumericOps
n_opt.equal
is where numpy.equal
comes from, as it is a ufunc.Hup, it looks like there's a method numpy.set_numeric_ops
(that has __doc__
, but I can't find its online documentation), referenced in this post:
https://stackoverflow.com/questions/45601964/source-code-location-and-explanation-of-vectorized-array-operations
We could use this to do something like:
np.set_numeric_ops(equal=drake_ufunc_equal)
and see if there's a way to teach the ufunc about how to handle different dtypes, so that we don't impact performance of other objects.
Example of a simple override (which would cause everything to slow down / suck):
>> eq = lambda a, b: a == b
>> generic_equal = np.frompyfunc(eq, 2, 1)
>>
>> a = Custom('a')
>> b = Custom('b')
>> print(a == b)
eq('a', 'b')
>>
>> av = np.array([a, b])
>> bv = np.array([b, a])
>>
>> print(generic_equal(av, bv))
["eq('a', 'b')" "eq('b', 'a')"]
>> print(np.equal(av, bv))
[ True True]
>>
>> np.set_numeric_ops(equal=generic_equal)
>> print(np.equal(av, bv))
[ True True]
>> print(av == bv)
["eq('a', 'b')" "eq('b', 'a')"]
>> np.equal = generic_equal
>> print(np.equal(av, bv))
["eq('a', 'b')" "eq('b', 'a')"]
Looks nice! I'll try that in the coming PR.
Hmm... Is it OK if we hold off on incorporating this just yet? I'd like to look a bit deeper and ensure that we don't incur any nasty side effects (e.g. killing speed for other libraries that aren't using Drake for image stuff, etc.), and see if there's an elegant solution to guarantee this.
What kind of timeline would you want this resolved on? (I've talked with @m-chaturvedi, and we'll start addressing #8116)
What kind of timeline would you want this resolved on?
@RussTedrake can answer this (he marked this with high priority).
it depends on the cost of resolving it... but I'm about to lecture about trajectory optimization, and this occurs very frequently when formulating trajectory optimizations.
i also marked it as "high" priority because in it's current form, it's a bit of a landmine...
Sounds good. I'll focus on the next few of days, because after having worked on it for a bit, I'm fairly confident that the solution for #8116 will most likely solve this too (if it goes like I planned it).
BTW: Posted StackOverflow as well: https://stackoverflow.com/questions/49205075/possible-to-have-logical-operations-e-g-ndarray-eq-not-return-bool-in
Finally finished the spike test (with some generalizing / polishing), and it seems like user-defined dtypes in NumPy will solve both this issue and #8116: https://github.com/EricCousineau-TRI/repro/blob/1b1edfa/python/pybind11/dtype_stuff/test_basic.cc#L117-L128 https://github.com/EricCousineau-TRI/repro/blob/1b1edfa/python/pybind11/dtype_stuff/test_basic.py#L89-L93
Output from those lines in the Python script (out of sheer laziness, I had operator==
just double the number):
>> av = np.array([Custom(1), Custom(2)])
>> mutate(av)
>> print(av)
[<Custom(101.0)> <Custom(102.0)>]
>> print(av == av)
[<Custom(202.0)> <Custom(204.0)>]
Required some shifty appeasements of both pybind11
and numpy
. There is also the remaining issue about memory leaking here (which now that I have a working prototype, will respond there with some more info):
https://github.com/numpy/numpy/issues/10721
While it sucks to leak memory, I think is the cleanest solution. In reviewing PyTorch, Tensorflow, etc., I realized that each of those might have some niceties, but would be a longer-term goal (using this article as a quick overview / guide: http://davidstutz.de/a-note-on-extending-tensorflow-pytorch-and-theano-for-deep-learning-on-custom-datastructures/ )
Will start polishing this to push into master
.
Right now I think the problem is that np.equal.types
contains only XX->?
loops, and you want an OO->O
loop.
I think it would be pretty reasonable for numpy to add a OO->O
loop to np.equal
and the other comparison ops - with that in place, you could overload the ==
operator to just force this loop to be used.
Hup, it looks like there's a method numpy.set_numeric_ops (that has doc, but I can't find its online documentation), referenced in this post
Note that just the other day an NEP was put forth suggesting this function be deprecated
I think it would be pretty reasonable for numpy to add a OO->O loop to np.equal and the other comparison ops - with that in place, you could overload the == operator to just force this loop to be used.
Yup!
Though I was a bit hesitant about forcing this into np
from Drake, as it would change the behavior globally -- a main concern being, if OO->O
was always the case, then would logical indexing with non-symbolic types (e.g. AutoDiffXd) still work the same? -- My guess is yes, minus the overhead of copying? (or whatever edge cases that don't handle implicit casting properly?)
I took the route of the user-defined dtypes because it let me take out two birds with one stone: by defining custom dypes, registering custom loops (in this case, SS->S
for ==
, AA->B
where S
is symbolic, A
is autodiff, and B
is bool) gets us there (1, 2), and then resolves #8116.
However, if that falls flat, we'll definitely use the additional ufunc type registration!
Note that just the other day an NEP was put forth suggesting this function be deprecated
Thanks for the heads up! I think we thankfully won't need it for now.
Let me try and be a little more precise about the route I'm envisaging:
numpy should implement the 'OO->O'
loop for the comparison functions, but not expose it via the defaults - np.equal
will never choose this loop without an explicit dtype
parameter.
AutoDiffXd
overrides __array_ufunc__
to force this loop type when a comparison operator is used. This solves np.equal
and ==
at the same time.
I'm not sure what you mean by logical indexing.
Custom dtypes are still a good design here, but if you already have the custom array subclasses, __array_ufunc__
would be a lot easier.
numpy should implement the 'OO->O' loop for the comparison functions, but not expose it via the defaults - np.equal will never choose this loop without an explicit dtype parameter.
Gotcha! So then, for testing comparison on Expression
, ~we'd need to be explicit?~ we'd just need to override __array_ufunc__
to use OO->O
to get a Formula
from expr1 == expr2
, and for AutoDiffXd
, we'd not need to override anything to get OO->B
fromad1 == ad2
?
EDIT: Realized my initial understand was a little less than optimal.
I'm not sure what you mean by logical indexing.
I was concerned about the following situation:
>>> import numpy as np
>>> np.version.version # Ubuntu 16.04
'1.11.0'
>>> x = np.array([10., 20. 30.])
>>> b = np.array([True, False, True])
>>> x[b]
array([ 10., 30.])
>>> x[b.astype(object)]
IndexError: arrays used as indices must be of integer (or boolean) type
But if we override AutoDiffXd.__array_ufunc__
to use AA->B
(or OO->B
), then that'd work!
we'd just need to override __array_ufunc__ to use OO->O to get a Formula from expr1 == expr2, and for AutoDiffXd, we'd not need to override anything to get OO->B fromad1 == ad2?
Exactly. If you want expr[expr != 0]
to work, then obviously you'd also need to implement Expression.__getitem__
, but that's probably an unpleasant can of worms.
Partial fix in numpy/numpy#10745
@eric-wieser I apologize for the late follow-up, but thank you for submitting the patch!
Just to clarify on this statement:
Custom dtypes are still a good design here, but if you already have the custom array subclasses,
__array_ufunc__
would be a lot easier.
Does this mean subclassing numpy.ndarray
? From looking at docs and mentions of __array_ufunc__
, it seems that the only way we can override this behavior is to subclass ndarray
-- does NumPy do any checks on the object itself for something like __array_ufunc__
? (kinda like np.cos
calls obj.cos()
?)
At present, we're not actually sublcassing numpy.ndarray
, but rather than just constructing objects using np.array([...], dtype=object)
.
Ah, that's unfortunate. Yes, you would need to subclass to actually change the == operator. Overloading at the element level is too late, because the output type of bool has been chosen before looking at any elements.
But this would at least allow you to use np.equals with a dtype argument.
You should make bool(autodiff) throw an exception, which would prevent people doing the wrong thing at least. Overload bool (nonzero on python 2).
We're in the process of making __nonzero__
on Python2 throw an exception; however, it seems that there is some unexpected behavior when __nonzero__
throws an exception?
https://github.com/EricCousineau-TRI/drake/blob/af624bc72a89dbceaf7000b5972770076b2d5030/bindings/pydrake/test/symbolic_test.py#L489-L498
Also, it seems that this behavior may be inconsistent on certain platforms (I'm guessing due to NumPy versions?) https://github.com/RobotLocomotion/drake/pull/8500#issuecomment-379562534
Per Slack convo with @hongkai-dai, this is one workaround that can be used with NumPy 1.13.3, used on Bionic, which predates upstream patches which are NumPy ~1.15: 33508e156
Output:
$ bazel run //bindings/pydrake:py/symbolic_test -- TestSymbolicExpression.test_relation_operators_workaround
[<Formula "(x < 1)"> <Formula "(y < 2)">]
[<Formula "(x < 1)"> <Formula "(y < 1)">]
[<Formula "(x < 1)"> <Formula "(x < 2)">]
Will write up a PR thing.
Just to leave a clear update (even for my own memory), the current status is that
from pydrake.all import MathematicalProgram, eq
prog = MathematicalProgram()
x = prog.NewContinuousVariables(2)
print(x)
print(x[0] == 0.)
print(eq(x,[0.,0.]))
print(x == [0.0])
results in
[Variable('x(0)', Continuous) Variable('x(1)', Continuous)]
(x(0) == 0)
[<Formula "(x(0) == 0)"> <Formula "(x(1) == 0)">]
---------------------------------------------------------------------------
DeprecationWarning Traceback (most recent call last)
<ipython-input-69-ff08b00d0643> in <module>()
5 print(x[0] == 0.)
6 print(eq(x,[0.,0.]))
----> 7 print(x == [0.0])
DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
btw @EricCousineau-TRI -- the "DeprecationWarning" above actually terminates my program.
Yup, that is intentional. If it didn't, then you'd just have a crappier error later on: https://github.com/RobotLocomotion/drake/blob/1f26da1a6f46abab582868f77086178bf30e7945/bindings/pydrake/common/deprecation.py#L177-L179
Now NEP 15 is done and numpy doc recommends __array_ufunc__
to achieve the functionality as quoted below,
The presence of
__array_ufunc__
also influences how ndarray handles binary operations like arr + obj and arr < obj when arr is an ndarray and obj is an instance of a custom class.
Any updates on this? Any plan to subclass np.ndarray in the future so built-in operators like <, > will be supported (and thus allow vectorized constraints)?
if __array_ufunc__
can resolve this, that would be fantastic!
Note that our minimum required version of numpy
is now >= 1.17.4
. That might be of help in this issue?
I looked further into this today and found a temporary solution. If we want a high-performance version of this (actual multi-threaded vectorized creation of new formulas), overriding __array_ufunc__
won't work,
This is because of there is no signature for logical operators to return objects. from numpy documentation, it's always bool. From the output of np.info(np.less_equal)
Returns
------- out : bool or ndarray of bool
Array of bools, or a single bool if `x1` and `x2` are scalars.
However, if we don't mind having the same performance as list comprehension, the solution is actually really easy, as shown in my code block below. We can use np.vectorize
to wrap the overloaded logical operators in Expression
, and put it in __array_ufunc__
. The function np.vectorize
is not an actual multi-threaded vectorized method. However, 1. for non-linear constraints, people need to use for loops to create and add them to MathematicalProgram
anyway, and np.vectorize
won't we worse than that. 2. usually we don't usually really have a huge amount of constraints. 3. np.vectorize
handles broadcasting nicely.
Given this three factors, I actually think this is a pretty good solution that we should be satisfied with.
Here is a temporary workaround for us:
from pydrake.all import MathematicalProgram
import numpy as np
import operator
prog = MathematicalProgram()
x = prog.NewContinuousVariables(2)
class PatchedArray(np.ndarray):
to_patch = {
np.greater: operator.gt,
np.greater_equal: operator.ge,
np.less: operator.lt,
np.less_equal: operator.le,
np.equal: operator.eq,
np.not_equal: operator.ne
}
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
with np.testing.suppress_warnings() as sup:
sup.filter(RuntimeWarning, "invalid value encountered in")
casted_inputs = (item.view(np.ndarray) if isinstance(item, PatchedArray) else item for item in inputs)
if ufunc in self.to_patch:
return np.vectorize(self.to_patch[ufunc])(*casted_inputs, **kwargs)
else:
return getattr(ufunc, method)(*casted_inputs, **kwargs).view(PatchedArray)
x_patched = x.view(PatchedArray)
print(x_patched == np.zeros(2))
print(x_patched >= np.zeros(2))
print(x_patched <= np.zeros(2))
print(x_patched < np.zeros(2))
print(type(x_patched + 1))
print(type(x_patched < np.zeros(2)))
print(x_patched[None] * np.ones((3, 2)) == np.zeros((3, 2)))
print(x_patched[None] * np.ones((3, 2)) >= 1)
Output will be
[<Formula "(x(0) == 0)"> <Formula "(x(1) == 0)">]
[<Formula "(x(0) >= 0)"> <Formula "(x(1) >= 0)">]
[<Formula "(x(0) <= 0)"> <Formula "(x(1) <= 0)">]
[<Formula "(x(0) < 0)"> <Formula "(x(1) < 0)">]
<class '__main__.PatchedArray'>
<class 'numpy.ndarray'>
[[<Formula "(x(0) == 0)"> <Formula "(x(1) == 0)">]
[<Formula "(x(0) == 0)"> <Formula "(x(1) == 0)">]
[<Formula "(x(0) == 0)"> <Formula "(x(1) == 0)">]]
[[<Formula "(x(0) >= 1)"> <Formula "(x(1) >= 1)">]
[<Formula "(x(0) >= 1)"> <Formula "(x(1) >= 1)">]
[<Formula "(x(0) >= 1)"> <Formula "(x(1) >= 1)">]]
This is because of there is no signature for logical operators to return objects. from numpy documentation, it's always bool.
It shouldn't be hard to write a patch to change this. IIRC, numpy now has "object loops" for the logical_and
and logical_or
operators, so contributing them for comparisons (assuming they're not in fact already there) would seem uncontroversial.
Excellent! Thank you @buoyancy99 and @eric-wieser !
I would definitely like to resolve this. @buoyancy99 , does Eric's hint give you enough to go on to explore that version of the solution?
In fact, the object loop already exists as far back as numpy 1.16.5 (https://github.com/numpy/numpy/pull/10745), which I referred to in https://github.com/RobotLocomotion/drake/issues/8315#issuecomment-373233680 above
In [30]: from sympy import symbols
In [31]: x, y = symbols('x y')
In [32]: np.greater([x], [y]) # errors
TypeError: cannot determine truth value of Relational
In [33]: np.greater([x], [y], dtype=object) # works
Out[33]: array([x > y], dtype=object)
Excellent! Thank you @buoyancy99 and @eric-wieser !
I would definitely like to resolve this. @buoyancy99 , does Eric's hint give you enough to go on to explore that version of the solution?
Just updated the patched subclass code following Eric's suggestion and it works. However, I believe that with this update, we will still need to subclass np.ndarray and link all C++ Expression arrays to it. This has to be done in our fork of pybind11 I believe. I need some additional guidance on how to start there.
Updated subclass code
class PatchedArray(np.ndarray):
to_patch = {
np.greater: operator.gt,
np.greater_equal: operator.ge,
np.less: operator.lt,
np.less_equal: operator.le,
np.equal: operator.eq,
np.not_equal: operator.ne
}
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
with np.testing.suppress_warnings() as sup:
sup.filter(RuntimeWarning, "invalid value encountered in")
casted_inputs = (item.view(np.ndarray) if isinstance(item, PatchedArray) else item for item in inputs)
if ufunc in self.to_patch:
kwargs.update(dtype=object)
return getattr(ufunc, method)(*casted_inputs, **kwargs)
else:
return getattr(ufunc, method)(*casted_inputs, **kwargs).view(PatchedArray)
x_patched = x.view(PatchedArray)
print(x_patched == np.zeros(2))
print(x_patched >= np.zeros(2))
print(x_patched <= np.zeros(2))
print(x_patched < np.zeros(2))
print(type(x_patched + 1))
print(type(x_patched < np.zeros(2)))
print(x_patched[None] * np.ones((3, 2)) == np.zeros((3, 2)))
print(x_patched[None] * np.ones((3, 2)) >= 1)
Probably you want the more conservative:
class PatchedArray(np.ndarray):
to_patch = { # no need for a `dict` any more
np.greater,
np.greater_equal,
np.less,
np.less_equal,
np.equal,
np.not_equal
}
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
with np.testing.suppress_warnings() as sup:
sup.filter(RuntimeWarning, "invalid value encountered in")
casted_inputs = (item.view(np.ndarray) if isinstance(item, PatchedArray) else item for item in inputs)
if ufunc in self.to_patch:
kwargs.setdefault('dtype', object)
return getattr(ufunc, method)(*casted_inputs, **kwargs)
else:
return getattr(ufunc, method)(*casted_inputs, **kwargs).view(PatchedArray)
which doesn't override the dtype
if the user requests a different one explicitly
Probably you want the more conservative:
class PatchedArray(np.ndarray): to_patch = { # no need for a `dict` any more np.greater, np.greater_equal, np.less, np.less_equal, np.equal, np.not_equal } def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): with np.testing.suppress_warnings() as sup: sup.filter(RuntimeWarning, "invalid value encountered in") casted_inputs = (item.view(np.ndarray) if isinstance(item, PatchedArray) else item for item in inputs) if ufunc in self.to_patch: kwargs.setdefault('dtype', object) return getattr(ufunc, method)(*casted_inputs, **kwargs) else: return getattr(ufunc, method)(*casted_inputs, **kwargs).view(PatchedArray)
which doesn't override the
dtype
if the user requests a different one explicitly
any suggestion for the pybind11 part?
Not really, I haven't touched pybind11 for a few years
This has to be done in our fork of pybind11 I believe.
Our goal in Drake is to get rid of our fork of pybind11
, and instead use a vanilla upstream version (#19250). Solutions that involve patching pybind11
will not be accepted. However, I don't think we need it in this case.
[We] will still need to subclass np.ndarray and link all C++ Expression arrays to it.
As I understand it, the goal is to return PatchedArray
from bindings like MathematicalProgram.NewContinuousVariables
that return a Matrix<Variable, ...>
? Even in the extreme, we could write a helper function to accomplish that and manually call it from all bindings that return arrays of variables, but probably a type caster template specialization could handle it automatically. Either way, it's do-able without changing pybind11
itself.
I get the following output: