HPAC / matchpy

A library for pattern matching on symbolic expressions in Python.
MIT License
166 stars 25 forks source link

`ManyToOneReplacer` for Rubi #8

Closed arihantparsoya closed 7 years ago

arihantparsoya commented 7 years ago

Hi, I implemented ~80 rules using ManyToOneReplacer for Rubi integration and MatchPy was able to match the subjects very quickly. However, when I added ~550 rules, there was significant decrease in speed while matching even smaller subjects.

It would be great if you could advice us on how to solve this issue.

Here are the files for our project:

Patterns: https://github.com/parsoyaarihant/sympy/blob/rubi4/sympy/rubi/patterns.py Constraint: https://github.com/parsoyaarihant/sympy/blob/rubi4/sympy/rubi/constraint.py All Rubi Files: https://github.com/parsoyaarihant/sympy/tree/rubi4/sympy/rubi

wheerd commented 7 years ago

I have a couple of ideas. First, if you recreate the ManyToOneReplacer every time you want to use it, it is not going to be faster than just using replace_all The setup time for creating all the states is much higher than just matching without it. It only pays off if you match multiple times or if you serialize the matcher and restore it. Otherwise the construction cost dominates the overall time.

A second thing are your constraints. To get the most speed out of them, you would need to split them up and turn them into individual constraints, i.e. get rid of the And inside. Then each constraint can be evaluated seperately. As an example, the constraint of the first pattern is

cons(And(FreeQ(List(a_, b_, c_, d_), x_), ZeroQ(Add(Mul(a_, d_), Mul(b_, c_))), PosQ(Mul(Pow(a_, Integer(-1)), b_, Pow(c_, Integer(-1)), d_))), (c, b, x, d, a))

Instead, you should try to keep them to as few variables as possible, so that they can be evaluated early:

FreeQ(a_, x_)
FreeQ(b_, x_)
FreeQ(c_, x_)
FreeQ(d_, x_)
ZeroQ(Add(Mul(a_, d_), Mul(b_, c_)))
PosQ(Mul(Pow(a_, Integer(-1)), b_, Pow(c_, Integer(-1)), d_)))

For FreeQ I think you don't need sympy to evaluate it. You can basically just use something like this:

class FreeQ(Constraint):
    def __init__(self, expr, var):
        self.expr = expr if isinstance(expr, str) else expr.name
        self.var = var if isinstance(var, str) else var.name

    def __call__(self, substitution):
        return substitution[self.var] not in substitution[self.expr]

    @property
    def variables(self):
        return frozenset([self.expr, self.var])

    def with_renamed_vars(self, renaming):
        return FreeQ(
            renaming.get(self.expr, self.expr),
            renaming.get(self.var, self.var)
        )

    def __eq__(self, other):
        return isinstance(other, FreeQ) and other.expr == self.expr and other.var == self.var

    def __hash__(self):
        return hash((self.expr, self.var))

I did not test this though. The first point is the most important one though. You only get a speedup, if you amortize the setup cost of the ManyToOneMatcher.

arihantparsoya commented 7 years ago

I changed the code to stop recreating ManyToOneReplacer multiple times and the speed is good now.

I knew that its better to have indivdual constraint rather than having a common cons() for all constraints since it helps ManyToOneReplacer to backtrack early. But I couldn't figure out how to implement logical statements such as And( , Or()) in MatchPy Constraints. I will definately try to seperate out FreeQ the way you described.

Thanks !

arihantparsoya commented 7 years ago

Hi, while running the test suit of rubi(which consists of ~1700 integrands). The speed of matching decreases while going through the test suit. Sometimes, the test suit gets stuck for long time.

I have seperated FreeQ as you suggested. If I solve the integrands(which take more time in the test suit) indivdually, they run fine.

wheerd commented 7 years ago

Off the top of my head I don't know why this is the case. Have you tried profiling the code? I find SnakeViz to be a useful tool for that. What do I have to do to replicate that test suite?

arihantparsoya commented 7 years ago

I have not tried profiling. I will try now.

The code is in my rubi4 SymPy branch.

Test suit: https://github.com/parsoyaarihant/sympy/blob/rubi4/sympy/rubi/tests/test_rubi_test_suit.py all ReplacementRule: https://github.com/parsoyaarihant/sympy/blob/rubi4/sympy/rubi/patterns.py

The tests will start running if you run bin/test sympy/rubi/tests/test_rubi_test_suit.py in the terminal from main sympy directory. For now I have made the tests such that it shows the intergrand which is being integrated(I have also added some documentation here).

arihantparsoya commented 7 years ago

PFA .prof file for first 121 tests. program.prof.zip

arihantparsoya commented 7 years ago

I think there must be problem in my own code. I will let you know as soon as I find it.

wheerd commented 7 years ago

Apparently SumSimplerQ is missing so I cannot run it locally:

  File "sympy\rubi\tests\test_rubi_test_suit.py", line 2, in <module>
    from sympy.rubi.rubi import rubi_integrate
  File "sympy\rubi\rubi.py", line 1, in <module>
    from .patterns import rubi_object
  File "sympy\rubi\patterns.py", line 5, in <module>
    from .constraint import cons, FreeQ
  File sympy\rubi\constraint.py", line 4, in <module>
    from .matchpy2sympy import matchpy2sympy
  File "sympy\rubi\matchpy2sympy.py", line 4, in <module>
    from .utility_function import (Int, ZeroQ, NonzeroQ, List, Log, RemoveContent,
ImportError: cannot import name 'SumSimplerQ'
arihantparsoya commented 7 years ago

There is a function ExpandIntegrand which expands the the integrands such as (x + 1)**2 into x**2 + 2*x + 1. Then each indivdual term is integrated again.

I created two graphs(One with expandintegrand and another without it). I found that ExpandIntegrand is taking majority of time in matchpy2sympy.

arihantparsoya commented 7 years ago

I have committed SumSimplerQ just now. You can update your branch

wheerd commented 7 years ago

By looking at the profiling you posted, I identified one bottleneck that I have hopefully fixed in the last commit. Would you be willing to try it out before I release a new version? You can check out MatchPy yourself and install it in development mode (via python setup.py develop). The many-to-one matcher was doing some redundant work for a lot of matches which should only be done now once.

arihantparsoya commented 7 years ago

I tried installing MatchPy using Github now but it is giving the following error( I also tried installing it previously and I got the same error):

parsoyaarihant@MacBook-Pro-4  ~/Downloads/matchpy-master  python setup.py install
zip_safe flag not set; analyzing archive contents...

Installed /Users/parsoyaarihant/Downloads/matchpy-master/.eggs/pytest_runner-2.11.1-py3.6.egg
Searching for setuptools_scm>=1.7.0
Reading https://pypi.python.org/simple/setuptools_scm/
Downloading https://pypi.python.org/packages/6c/8c/86bb5eb353184a8cfb6488d8f1d286e2028907c6d24fc9e604e6aa40ac8e/setuptools_scm-1.15.6-py3.6.egg#md5=342049ba90a3dde0d6372456fe92884a
Best match: setuptools-scm 1.15.6
Processing setuptools_scm-1.15.6-py3.6.egg
Moving setuptools_scm-1.15.6-py3.6.egg to /Users/parsoyaarihant/Downloads/matchpy-master/.eggs

Installed /Users/parsoyaarihant/Downloads/matchpy-master/.eggs/setuptools_scm-1.15.6-py3.6.egg
Traceback (most recent call last):
  File "setup.py", line 45, in <module>
    'graphs': ['graphviz>=0.5,<0.6'],
  File "/Users/parsoyaarihant/anaconda/lib/python3.6/distutils/core.py", line 108, in setup
    _setup_distribution = dist = klass(attrs)
  File "/Users/parsoyaarihant/anaconda/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/setuptools/dist.py", line 318, in __init__
  File "/Users/parsoyaarihant/anaconda/lib/python3.6/distutils/dist.py", line 281, in __init__
    self.finalize_options()
  File "/Users/parsoyaarihant/anaconda/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/setuptools/dist.py", line 375, in finalize_options
  File "/Users/parsoyaarihant/Downloads/matchpy-master/.eggs/setuptools_scm-1.15.6-py3.6.egg/setuptools_scm/integration.py", line 22, in version_keyword
  File "/Users/parsoyaarihant/Downloads/matchpy-master/.eggs/setuptools_scm-1.15.6-py3.6.egg/setuptools_scm/__init__.py", line 119, in get_version
  File "/Users/parsoyaarihant/Downloads/matchpy-master/.eggs/setuptools_scm-1.15.6-py3.6.egg/setuptools_scm/__init__.py", line 97, in _do_parse
LookupError: setuptools-scm was unable to detect version for '/Users/parsoyaarihant/Downloads/matchpy-master'.

Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.

For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
wheerd commented 7 years ago

Did you do a checkout or just download the source code? You need to do a checkout, otherwise it does not work, because the version is determined based on git tags.

wheerd commented 7 years ago

Oh, by the way, I can recommend you to use parametrized tests. Then you have a seperate test case for every example and pytest will track for you whiche ones pass/fail.

Edit: I am not sure whether that will work without pytest, because sympy seems to use its own runner.

arihantparsoya commented 7 years ago

I did checkout now and installed it. However, I am unable to import ManyToOneReplacer.

Python 3.6.0 |Anaconda custom (x86_64)| (default, Dec 23 2016, 13:19:00)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from matchpy import ManyToOneReplacer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'ManyToOneReplacer'

Update: I will fix this issue myself in some time.

arihantparsoya commented 7 years ago

SymPy doesn't allow to import external python libraries. That is why it doesn't use py.test. Sympy has similar tests to pytest but I am not using them currently.

arihantparsoya commented 7 years ago

ManyToOneReplacer issue if fixed. I will create a .prof file and compare it with previous test.

wheerd commented 7 years ago

There is still a bottleneck in the ManyToOneMatcher with the bipartite graphs. Changing this is a bit more complicated but should also increase the speed.

arihantparsoya commented 7 years ago

Okay, the test suit is not running at good speed(probably due to ExpandIntegrand) but its not getting stuck like it use to earlier. I will try to optimise ExpandIntegrand.

wheerd commented 7 years ago

I changed the generation of the bipartite graphs so that they should not take as long anymore. This might speed up your test suit a bit more. Now ExpandIntegrand is indeed the biggest bottleneck when I run it.

arihantparsoya commented 7 years ago

Thanks !