Closed dukecyto closed 10 years ago
Hi Jun Saito,
Many thanks for the feedback, this is really helpful.
Personally I think weave code in GPy, at least as is now, is just adding extra incompatibility and more maintenance labor with very little performance gain.
We agree that weave adds some complexity, but the performance gain is significant for many use cases.
@nfusi is working on a branch which switches weave out for cython. Does cython work for your case? We absolutely need some speed-up over pure numpy for many operations.
Best wishes,
James.
Hi Jun,
We definitely want to reduce the dependencies in the manner you are describing, you are totally right that on many occasions there is no need to introduce matplotlib etc.
We are scratching our heads a lot over the weave because, as James mentions, there is performance gain. We are also working on sympy based computation of covariance functions which are auto-differentiated, for the moment there is also a weave dependency there.
I think in principle we are very much in favour of the type of dependency reduction you describe!
Neil
Hi,
Thanks for all the comments!
I understand that taking out weave is a debatable matter - I am sure there are places in GPy where weave code is absolutely necessary for reasonable speed. It is just that functions I ran into, fast_array_equal() and symmetrify(), are good candidates to turn on weave only when a user really needs to crunch that millisecond.
To prove my point, I timed the speed of fast_array_equal().
In [150]: A = np.random.randn(10000*10000).reshape(10000,10000)
In [152]: B = A.copy()
In [153]: timeit fast_array_equal(A,B) 10 loops, best of 3: 127 ms per loop
In [155]: timeit np.all(A==B) 10 loops, best of 3: 129 ms per loop
I know that this is one of the the worst cases for fast_array_equal() because fast_array_equal() can terminate early when arrays are not equal. But my point is, this will not make much of a difference in overall performance but adds extra complexity to the system.
Here is another timing on symmetrify():
In [160]: timeit symmetrify(A) 1 loops, best of 3: 3.83 s per loop
In [163]: timeit my_symmetrify(A) 1 loops, best of 3: 4.15 s per loop
My pure NumPy implementation is only slightly slower.
Cython is probably a more deployment-friendly choice compared to weave, as it pre-compiles the code instead of compiling at run-time. I hope I can set up Cython on Windows to use VS2010 so the binaries will not crash Maya.
I think regardless of the utility of weave, your point still stands. It makes sense to provide a version of GPy with 'cut down' dependencies so that it can be easily deployed for end users (who won't need advanced functionality from plotting or symbolic covariance functions etc), so I think we should give this serious thought!
On Tue, Jan 7, 2014 at 10:39 PM, dukecyto notifications@github.com wrote:
Hi,
Thanks for all the comments!
I understand that taking out weave is a debatable matter - I am sure there are places in GPy where weave code is absolutely necessary for reasonable speed. It is just that functions I ran into, fast_array_equal() and symmetrify(), are good candidates to turn on weave only when a user really needs to crunch that millisecond.
To prove my point, I timed the speed of fast_array_equal().
In [150]: A = np.random.randn(10000*10000).reshape(10000,10000)
In [152]: B = A.copy()
In [153]: timeit fast_array_equal(A,B) 10 loops, best of 3: 127 ms per loop
In [155]: timeit np.all(A==B) 10 loops, best of 3: 129 ms per loop
I know that this is one of the the worst cases for fast_array_equal() because fast_array_equal() can terminate early when arrays are not equal. But my point is, this will not make much of a difference in overall performance but adds extra complexity to the system.
Here is another timing on symmetrify():
In [160]: timeit symmetrify(A) 1 loops, best of 3: 3.83 s per loop
In [163]: timeit my_symmetrify(A) 1 loops, best of 3: 4.15 s per loop
My pure NumPy implementation is only slightly slower.
Cython is probably a more deployment-friendly choice compared to weave, as it pre-compiles the code instead of compiling at run-time. I hope I can set up Cython on Windows to use VS2010 so the binaries will not crash Maya.
— Reply to this email directly or view it on GitHubhttps://github.com/SheffieldML/GPy/issues/94#issuecomment-31787600 .
Garth Lawrence's Service of Thanksgiving http://staffwww.dcs.shef.ac.uk/people/N.Lawrence/garth/
Neil, It is wonderful to hear all this. It was definitely worth posting my problems.
Just as a side note, my rule of thumb for using weave or Cython is
just having a little ack through to find where weave still pervades... we can remove it from
fast_array_equal is now redundant because of the fancy caching framework, it's not used at all any more.
The only remaining places are 'symmetrify', which is used a fair bit, and the coregionalize kernel. we can definitely have bumpy fallback for those.
I'll try to have a hack at that this week.
I have cut the dependence on matplotlib. Now without matplotlib, GPy would not crash anymore. :-)
Zhenwen On 22/09/14 22:21, James Hensman wrote:
just having a little ack through to find where weave still pervades... we can remove it from
- univariate_gaussian.py since the speed loss should be minimal (although it's criminal that scipy.stat's version is three times slower than weave...why should that be?)
fast_array_equal is now redundant because of the fancy caching framework, it's not used at all any more.
The only remaining places are 'symmetrify', which is used a fair bit, and the coregionalize kernel. we can definitely have bumpy fallback for those.
I'll try to have a hack at that this week.
— Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/94#issuecomment-56445226.
This morning I removed the dependency of the Coregionalize kernel on weave. Weave is still used, but if compilation fails, it switches back to numpy.
I think that might be the last of the weave realted issues.
Good work everyone!
Is there some way of testing the minimal install, one would hope we could unit test this capability somehow, but I can see that being complicated.
Neil
On Mon, Sep 29, 2014 at 10:21 AM, James Hensman notifications@github.com wrote:
This morning I removed the dependency of the Coregionalize kernel on weave. Weave is still used, but if compilation fails, it switches back to numpy.
I think that might be the last of the weave realted issues.
Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/94#issuecomment-57135790.
I think this sort of stuff is difficult to test with unittest, but we can always do the test ourselves before releasing to master branch. We should only ask people to use master branch, I think.
Zhenwen
On 29/09/14 10:29, Neil Lawrence wrote:
Good work everyone!
Is there some way of testing the minimal install, one would hope we could unit test this capability somehow, but I can see that being complicated.
Neil
On Mon, Sep 29, 2014 at 10:21 AM, James Hensman notifications@github.com wrote:
This morning I removed the dependency of the Coregionalize kernel on weave. Weave is still used, but if compilation fails, it switches back to numpy.
I think that might be the last of the weave realted issues.
Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/94#issuecomment-57135790.
— Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/94#issuecomment-57136657.
Agreed, felt the same, but wondered if someone knew something about unit testing with different dependency structures. We do need to make sure that these things don't creep back in somehow.
Neil
On Mon, Sep 29, 2014 at 10:42 AM, Zhenwen Dai notifications@github.com wrote:
I think this sort of stuff is difficult to test with unittest, but we can always do the test ourselves before releasing to master branch. We should only ask people to use master branch, I think.
Zhenwen
On 29/09/14 10:29, Neil Lawrence wrote:
Good work everyone!
Is there some way of testing the minimal install, one would hope we could unit test this capability somehow, but I can see that being complicated.
Neil
On Mon, Sep 29, 2014 at 10:21 AM, James Hensman < notifications@github.com> wrote:
This morning I removed the dependency of the Coregionalize kernel on weave. Weave is still used, but if compilation fails, it switches back to numpy.
I think that might be the last of the weave realted issues.
Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/94#issuecomment-57135790.
Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/94#issuecomment-57136657.
Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/94#issuecomment-57138057.
You could make a clean VM for each of the dependency sets you wanted to test, and then run some shell scripts to bring the VMs up, install GPy and test it. Then just roll back each VM to the starting state after you've run the tests. It's a pretty heavy weight task to slot into a unit-test loop, but it should be doable.
I can think of a few ways to completely automate it cheaply, but those rely upon ZFS and Solaris, so probably aren't that helpful. Maybe Linux containers/docker will provide the right infrastructure, it seems to be the right kind of model for this type of installation testing.
I'm going to close this issue: GPy no longer depends on matplotlib or weave, though they do get used for some optional funcitonality (like plotting!)
Hi,
Summary: Any plans to make GPy only depend on 'matplotlib', 'pylab', and 'weave' when absolutely necessary?
Details: I wanted to make use of GP in computer animation and I was looking into running GPy within Autodesk Maya which embeds Python 2.6. Since version 2013, Maya on Windows is built using VS2010 which is giving headache to people all over the world because standard Python 2.6 distribution is built using VS2008. So to make use of any binary Python modules (including NumPy/SciPy, PIL, and all other great stuff in Python community) from latest Maya, people have to rebuild those binary modules.
When I imported GPy from Maya on Windows, first thing I noticed was it tries to import matplotlib and pylab even when I just want to do regression, no plotting. I do not want to waste my time rebuilding matplotlib and pylab on Windows (if I need to plot I would not work from Maya). Therefore, on my local tree, I modified the code so it only imports those modules when methods like plot() is called. I noticed that some GPy modules are already coded this way, and I want to know if there is a general consensus within GPy developers to modify all GPy code that way.
The next thing I noticed is some functions use weave for speed-up. This is troublesome especially when deploying tools using GPy on Windows. Windows is not equipped with C compilers out-of-the-box, and I cannot ask everyone using my animation tool to install Visual Studio or MinGW. Then I discovered that weave is used in functions like util.mics.fast_array_equal() and symmetrify() which I think would be fast enough just by pure NumPy implementation (I did not actually time them so I might be wrong). Would it be possible to at least give users choice to use weave or not? Personally I think weave code in GPy, at least as is now, is just adding extra incompatibility and more maintenance labor with very little performance gain.
I really wish GPy could lower the bar of introducing GP and related techniques to some practical use in the industry. Since GPy is developed on Python, I assume you want wide range of people, not just from academia, to use it. If so, I hope you will somewhat consider my proposals in the future direction of the development.
Jun Saito