Open Upabjojr opened 3 years ago
As a convention, operands in commutative functions are sorted in MatchPy. If I'm not mistaken, in that line we know that the operands come from a commutative functions because they are stored in a multiset. As a result, it's necessary to sort operands there. For sure it is possible to modify MatchPy such that one can use a custom sorting key for expressions. However, I'm afraid that this could break things. Since there is this convention that operand in commutative function are sorted, it's possible that there are places where the code relies on operands being sorted according to the total ordering that is defined in MatchPy. With a custom sorting key, they could end being sorted in the "wrong" order. Even if we are able to fix this problem, I suspect that you may run into a bunch of other problems when using SymPy expression in MatchPy. I mean, I don't know what you have tried already and what you did so far to make things compatible, but MatchPy is by no means built to be compatible with SymPy. Whatever else worked so far worked by chance. Have you considered translating SymPy expressions to MatchPy expressions? I believe to remember that someone did this during a Google Summer of Code.
In order to reproduce this bug, using the branch in https://github.com/sympy/sympy/pull/20612
I have the following:
In [1]: from sympy.utilities.matchpy_connector import WildDot, WildPlus, WildStar
In [2]: from matchpy import substitute
In [3]: from sympy import *
In [4]: x, y, z = symbols("x y z")
In [5]: w = WildPlus("w")
In [6]: from matchpy import ManyToOneMatcher
In [7]: matcher = ManyToOneMatcher()
In [8]: from matchpy import Pattern
In [9]: matcher.add(Pattern(x + w))
In [10]: result = matcher.match(x + y + z)
In [11]: result
Out[11]: <matchpy.matching.many_to_one._MatchIter at 0x7f3022a3da10>
In [12]: result = next(iter(result))
In [13]: result
Out[13]: (Pattern(x + w), {'w': Multiset({y: 1, z: 1})})
In [14]: substitute(result[0], result[1])
---------------------------------------------------------------------------
TypeError: cannot determine truth value of Relational
ManyToOneMatcher
correctly returns a Multiset
. Now the problem just lies in substitute
as you cannot call sorted( )
on a list of SymPy objects without specifying a sorting key.
Now, if I redefine the MatchPy substitute
function to:
from matchpy import *
from matchpy.functions import *
from typing import *
from multiset import *
from sympy.core.compatibility import default_sort_key
def substitute(expression: Union[Expression, Pattern], substitution: Substitution) -> Replacement:
if isinstance(expression, Pattern):
expression = expression.expression
return _substitute(expression, substitution)[0]
def _substitute(expression: Expression, substitution: Substitution) -> Tuple[Replacement, bool]:
if getattr(expression, 'variable_name', False) and expression.variable_name in substitution:
return substitution[expression.variable_name], True
elif isinstance(expression, Operation):
any_replaced = False
new_operands = []
for operand in op_iter(expression):
result, replaced = _substitute(operand, substitution)
if replaced:
any_replaced = True
if isinstance(result, (list, tuple)):
new_operands.extend(result)
elif isinstance(result, Multiset):
new_operands.extend(sorted(result, key=default_sort_key))
else:
new_operands.append(result)
if any_replaced:
return create_operation_expression(expression, new_operands), True
return expression, False
The code now works correctly:
In [24]: substitute(result[0], result[1])
Out[24]: x + y + z
So it looks like the usage of sorted( )
in the function _substitute( )
is the only cause of problems here (everything inside ManyToOneMatcher
works fine).
@hbarthels is sorted( )
really necessary in _substitute( )
? Can this function be modified in order to work with SymPy?
In order to reproduce this bug, using the branch in sympy/sympy#20612
Ah, now it makes a lot more sense that all the other stuff works. 😄
is sorted( ) really necessary in _substitute( ) ?
To be honest, I don't know the code well enough to know for sure if it is necessary there. I guess it's not necessary if the expression that is returned by substitute
is not used again as input to MatchPy. However, in a function like replace_all
, the output of substitute
will be used again. As a first step to investigate this, you could remove the sorted
and run the tests.
Can this function be modified in order to work with SymPy?
I have the following idea to solve this problem: You add a function to MatchPy that allows to pass a custom sort key, which is stored in some global variable. Then, you implement something like a sorted_expressions
function that is a wrapper around sorted
. If a custom sort key was provided, it is used in sorted
; otherwise, it calls sorted
without a sort key. The sorted_expressions
function then replaces all calls to sorted
and sort
that sort expressions. I don't know where else expressions are sorted, so you have to search for it. However, as I said above, a custom sort key could potentially break things, so I recommend that you test this thoroughly with SymPy expressions.
By the way, is it possible that in the code you posted, you accidentally used to original version of _substitute
? I don't see any difference.
As a first step to investigate this, you could remove the
sorted
and run the tests.
Here it is: https://github.com/HPAC/matchpy/pull/66
I have the following idea to solve this problem: You add a function to MatchPy that allows to pass a custom sort key, which is stored in some global variable. Then, you implement something like a
sorted_expressions
function that is a wrapper aroundsorted
Removing sorted( )
seems to be enough.
However, as I said above, a custom sort key could potentially break things, so I recommend that you test this thoroughly with SymPy expressions.
Is it possible to add a unit test to MatchPy with a few checks for SymPy's compatibility? It would be great to keep SymPy and MatchPy compatible with each other in the future (there's been a lot of effort to make them compatible, so why not enforcing a check to make sure it won't break in the future?).
By the way, is it possible that in the code you posted, you accidentally used to original version of
_substitute
? I don't see any difference.
I copy-pasted the wrong code. Now it's fixed.
You could change substitute()
to take an optional argument for the sort key which it passes down to the sorted()
function. That way it does not break existing uses but enables you to specify the sort key. If the sorting is used in more places, it could make sense to make it a global option.
I don't think that removing or changing sorted()
in substitute()
is sufficient. I remember now that we sort operands in commutative functions to ensure that expressions that are equivalent modulo commutativity can easily be identified as equivalent by a syntactic comparison. If sorted()
is removed in substitute()
, or if the custom sort key is used in substitute()
but nowhere else, it could happen that equality tests and hashing don't work the way they are supposed to for the output of substitute()
.
Is it possible to add a unit test to MatchPy with a few checks for SymPy's compatibility? It would be great to keep SymPy and MatchPy compatible with each other in the future (there's been a lot of effort to make them compatible, so why not enforcing a check to make sure it won't break in the future?).
Sure, that makes sense.
I don't think that removing or changing
sorted()
insubstitute()
is sufficient. I remember now that we sort operands in commutative functions to ensure that expressions that are equivalent modulo commutativity can easily be identified as equivalent by a syntactic comparison.
The other utilities of MatchPy appear to work well with SymPy... substitute( )
is the only one that causes sorting problems to me.
The other utilities of MatchPy appear to work well with SymPy...
substitute( )
is the only one that causes sorting problems to me.
I searched the code for calls to sorting functions. The call that makes sure that operands are sorted is part of the Expression
class, so that shouldn't be an issue with SymPy expression. There are other calls to sorted()
, for example here: https://github.com/HPAC/matchpy/blob/d542872f804e22590bfbee73f3121f80a99878a2/matchpy/matching/many_to_one.py#L881
However, I can't tell if the comparison depends on comparing expressions just by looking at it. So maybe changing substitute( )
is indeed sufficient, but I wouldn't rely on it without a lot of testing.
Does SymPy automatically sort operands in commutative functions?
Does SymPy automatically sort operands in commutative functions?
Add
and Mul
have their own constructors that do indeed end up sorting the expressions. The transformations undergone in the constructors of commutative operators in SymPy are more complicated than just sorting and are customly written for the main operators.
Is it important to you that we create a new release of MatchPy as soon as possible? If not, I would prefer to wait a bit in case more problems show up. I guess it would also make sense to add tests for the SymPy compatibility before the next release. Are you working on those tests?
I guess it would also make sense to add tests for the SymPy compatibility before the next release. Are you working on those tests?
We have these tests in SymPy. I have merged some extensions to sympy.utilities.matchpy_connector
15 days ago. We may need to wait for a new release of SymPy before we can add tests to MatchPy.
Calling
sorted( )
in https://github.com/HPAC/matchpy/blob/5cae3f275e3a1f725518516bf3c700dc3be03a56/matchpy/functions.py#L87 fails with SymPy, as SymPy objects are not comparable with operators (<
,<=
,>
,>=
). Operators are overloaded in SymPy to create instances of inequality objects.SymPy has a function called
default_sort_key
meant to deal with this problem:The question is, is it possible to modify MatchPy to specify a custom sorting key so that SymPy can specify the way its objects should be sorted?
Or is it an issue with
Multiset
?