dendrograms / astrodendro

Generate a dendrogram from a dataset
https://dendrograms.readthedocs.io/
Other
37 stars 38 forks source link

Improve performance of dendrogram viewer #31

Open astrofrog opened 11 years ago

astrofrog commented 11 years ago

At the moment, the viewer implemented in #29 is quite slow when sliders are changed - this is because the slider automatically calls fig.canvas.draw, which re-draws the whole figure. It's possible to improve performance by making use of draw_artist to selectively re-draw only elements that are changing, though this requires re-implementing a slider class to allow this. I've done this all locally in a hacky way, but will try and tidy it up and submit a patch here.

Unfortunately, draw_artist doesn't work properly with the MacOS X backend, so it will have to be a solution for other backends only.

astrofrog commented 11 years ago

Another solution is to again build an actual GUI and have a separate canvas for the dendrogram and the image viewer, which would prevent having to re-draw the dendrogram every time the sliders are changed.

astrofrog commented 11 years ago

But obviously that's not ideal if one does not have a GUI toolkit installed, unless we make use of Tk. Of course, we could have multiple viewers depending on what is installed and just state that the pure-matplotlib one is slow.

ChrisBeaumont commented 11 years ago

-1 on a toolkit dependency for the simple viewer -- the focus should be first on ease of use, and second on speed.

Have you profiled this? My suspicion is that the performance hit either comes from iteration over the dendrogram (if this happens at each redraw), or from matplotlib byte-scaling images at full-resolution. If it's the latter, then draw_image should be taking the most time. It's probably possible to step around this by bytescaling the image up-front, and then carefully reassigning the matplotlib data array without triggering their cache clear.

astrofrog commented 11 years ago

I wasn't sure what the best would be to run the profiling, but if (once the viewer is open) I force fig.canvas.draw to be called 100 times, I get the following for the profiling:

        1    0.007    0.007   21.644   21.644 perseus_profile.py:1(<module>)
      100    1.799    0.018   18.994    0.190 {method 'draw' of '_macosx.FigureCanvas' objects}
20902/101    0.119    0.000   17.200    0.170 artist.py:52(draw_wrapper)
      101    0.009    0.000   17.196    0.170 figure.py:953(draw)
      400    0.039    0.000   16.823    0.042 axes.py:1968(draw)
      800    0.029    0.000   11.841    0.015 axis.py:1079(draw)
     3000    0.052    0.000    7.626    0.003 axis.py:224(draw)
     6200    0.468    0.000    6.409    0.001 lines.py:505(draw)
33205/22005    0.489    0.000    3.244    0.000 transforms.py:1992(transform_non_affine)
19208/16408    0.108    0.000    2.441    0.000 transforms.py:2213(transform_non_affine)

Note that we are getting slow performance without even having to re-compute the dendrogram - this is a Matplotlib problem. For example, if you try the following:

import numpy as np
from matplotlib import pyplot as plt
from matplotlib.widgets import Slider

fig = plt.figure()

ax1 = fig.add_axes([0.1, 0.1, 0.4, 0.7])
image = ax1.imshow(np.random.random((512, 512)), vmin=0., vmax=1.)

ax2 = fig.add_axes([0.1, 0.9, 0.85, 0.05])
ax2.set_xticklabels("")
ax2.set_yticklabels("")
slider = Slider(ax2, "", 0., 1.)

def update_vmax(value):
    image.set_clim(0., value)
    fig.canvas.draw()

slider.on_changed(update_vmax)

ax3 = fig.add_axes([0.55, 0.1, 0.4, 0.7])
x = np.random.random(10000)
y = np.random.random(10000)
ax3.scatter(x, y)

plt.show()

You will likely find the cursor slow and unresponsive, which I think is what's happening for us because everything has to get re-drawn every time. I've actually emailed the above example to the matplotlib-users list to request ideas to speed it up.

astrofrog commented 11 years ago

There's an additional hit on performance when one changes slice, not just vmin/vmax, which is that the contour gets re-computed (no way to avoid that as far as I can tell).

ChrisBeaumont commented 11 years ago

Ok you're right, it doesn't seem to be dominated by image recalculation.

We can probably use plot instead of scatter, which is ~10x faster on Agg backends

The slow contour is one of the reasons why Glue uses semi-transparent masks instead of contours to highlight regions

astrofrog commented 11 years ago

@ChrisBeaumont - in our case, we don't actually use scatter, but that is a good point that my simple example could have used plot instead.

I'm wondering whether the plotting of a LineCollection consisting of many short 2-d lines is what is slowing things down.

astrofrog commented 11 years ago

33 actually implements a fix that speeds up the FPS by 2x in most cases, thought it's still not what I'd consider smooth. Still, it's a Matplotlib-only GUI, so we can't expect fantastic performance.

In future, we can consider having an alternate viewer that uses a GUI toolkit (while keeping the matplotlib-only one of course) - this would allow the different GUI elements to be decoupled, and performance improved.