dendrograms / astrodendro

Generate a dendrogram from a dataset
https://dendrograms.readthedocs.io/
Other
37 stars 38 forks source link

Basic visualization utilities #29

Closed astrofrog closed 11 years ago

astrofrog commented 11 years ago

This pull request implements basic visualization-related methods for Dendrogam. The first method is a sorting method that will sort the tree according to a key function, which by default is set to the peak flux of the substructure (including subtree). The other method is one that returns a list of lines, including a mapping from line objects to structures. With this in place, it's already possible to make simple plots:

import matplotlib.pyplot as plt
from astrodendro import Dendrogram
from astropy.io import fits

d = Dendrogram.compute(fits.getdata("MSX_E.fits"), verbose=True, min_data_value=1.e-4)

lines, mapping = d.get_lines()

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
for line in lines:
    ax.add_line(line)
ax.set_xlim(0., 100.)
ax.set_ylim(0., 1.e-3)
fig.savefig('test.png')

test

d.sort(reverse=True)  # by default, sort by peak value

lines, mapping = d.get_lines()

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
for line in lines:
    ax.add_line(line)
ax.set_xlim(0., 100.)
ax.set_ylim(0., 1.e-3)
fig.savefig('test_sorted.png')

test_sorted

The reason for returning the 'mapping' is that one can then set the picker attribute on the line objects to enable matplotlib object picking. Once this is done, we need the event handling function to access the structure for a given line. We might also need an inverse mapping since one may want to find e.g. all the lines belonging to the subtree of a structure. This would be easy too add.

Ultimately, we want Dendrogram to have a plot method, but I decided to start with these as a first step.

@ChrisBeaumont - what do you think of this so far?

astrofrog commented 11 years ago

Here's an example with an event picker:

lines, mapping = d.get_lines()

def line_picker(event):
    event.artist.set_color('red')
    event.canvas.draw()

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
for line in lines:
    line.set_picker(2.)
    ax.add_line(line)
ax.set_xlim(0., 100.)
ax.set_ylim(0., 1.e-3)
fig.canvas.mpl_connect('pick_event', line_picker)

By the way, this is why get_lines returns many Line2D instances, not one LineCollection.

astrofrog commented 11 years ago

There was a bug in the sorting (forgot to sort the trunk) - it's working now!

test_sorted

Note, I'm sure there are optimizations to be made, but this is more about the API.

astrofrog commented 11 years ago

@ChrisBeaumont - what do you think of something along the lines of the current code? Each structure has a method that allows me to get the list of leaves sorted by some criterion (and taking into account hierarchy). This is then used by Structure.get_lines which in turn is used by Dendrogram.get_lines.

Dendrigram.sort is no longer needed - shall I remove it, or can you see any cases where it would be useful?

Here's an example of a simple viz with line picking:

import matplotlib.pyplot as plt
from astrodendro import Dendrogram
from astropy.io import fits

d = Dendrogram.compute(fits.getdata("MSX_E.fits"), verbose=True, min_data_value=1.e-4)

# Get the lines as individual elements, and the mapping from line to structure
lines, mapping = d.get_lines(collection=False, reverse=True)

inverse_mapping = {}
for line in mapping:
    s = mapping[line]
    if s not in inverse_mapping:
        inverse_mapping[s] = [line]
    else:
        inverse_mapping[s] += [line]

selected = []

def line_picker(event):
    global selected
    for l in selected:
        l.set_color('blue')
        l.set_lw(1)
    selected = []
    structure = mapping[event.artist]
    for s in structure.descendants + [structure]:
        for l in inverse_mapping[s]:
            l.set_color('red')
            l.set_lw(2)
            selected.append(l)
    event.canvas.draw()

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
for line in lines:
    line.set_picker(2.)
    ax.add_line(line)
ax.set_xlim(0., 100.)
ax.set_ylim(0., 1.e-3)

fig.canvas.mpl_connect('pick_event', line_picker)

Lots of hard-coded stuff for now, this is just to show how the simplistic line_picker works. It's kind of slow, but I want to get it working first, then figure out the bottleneck.

One issue for now that Structure.get_lines requires a start_pos argument to know where the structure should be shown in x, but the line picker in the above example doesn't know anymore what start_pos is - so it might be better if Structure.get_lines figures it out regardless of how it's called. I'm working on it.

astrofrog commented 11 years ago

Will be writing up the simple visualizer as a class now to avoid the use of global and make it easier to use.

astrofrog commented 11 years ago

I've written up a visualization class here: https://gist.github.com/astrofrog/5831143

It's very simple for now and quite slow, but is this the type of basic visualization we want? Where should it live in the package?

I will be adding the option to click in the image and highlight the dendrogram. I also wonder whether it would be more efficient to plot the tree as a LineCollection and figure out ourselves which structure was intersected (rather than plot potentially hundreds or thousands of Line2D objects.

screen shot 2013-06-21 at 15 35 23

Another issue - the picker doesn't pick just the closest artist, it picks all the artists within a certain radius and cycles through them, which we don't want.

astrofrog commented 11 years ago

It turns out if one uses a LineCollection, event.ind gives the indices of the paths in the LineCollection that have been selected, so I think things can be sped up a lot :)

ChrisBeaumont commented 11 years ago

This screenshot looks good! Something like this will be very useful for mapping dendrogram ids to image features. A few comments;

ChrisBeaumont commented 11 years ago

I havent digested it yet, but Structure.get_lines seems like a very long + complicated method. Naively, I would have thought it should be able to do this more succinctly.

If not, can we at least split this up into easier subroutines? One easy change is to separate the logic that generates structure x/y vertices with the logic that packs them into MPL artists.

astrofrog commented 11 years ago

@ChrisBeaumont - thanks for all the suggestions - I'll try and address them all today or tomorrow. I've managed to get a version working locally which only uses LineCollection, so I think I can already simplify some of the code in get_lines since I don't see the point in getting Line2D objects anymore. But I'll also try and separate out the matplotlib part from the rest too.

astrofrog commented 11 years ago

@ChrisBeaumont - well, I guess you were right, it's much easier if I put all the plotting stuff in a dedicated class ;)

If you try out this version of the branch with

from astropy.io import fits

from astrodendro import Dendrogram
from astrodendro import BasicDendrogramViewer

image = fits.getdata("Perseus.fits")
d = Dendrogram.compute(image, min_data_value=0.3, min_npix=50, verbose=True)

v = BasicDendrogramViewer(image, d)

it should look correct - let me know if not! By the way, you can now already click in the image and highlight the dendrogram.

I'm working on the remaining features.

astrofrog commented 11 years ago

Regarding performance, it turns out that simply instantiating LineCollection from a list of lines gets quite slow (0.1-0.2s for thousands of structures). However, I think I can potentially get a significant improvement in performance by using a technique similar to what you used for the TreeIndex to make the StructureCollection return a subset of itself. I will look into that, though for now I will concentrate more on extending the functionality.

astrofrog commented 11 years ago

3-d slicing and sliders for the stretch are now done!

astrofrog commented 11 years ago

By the way, the tests are failing because of the new definition of height, but you can already try out this branch.

ChrisBeaumont commented 11 years ago

This looks great! I left some minor comments inline; otherwise, this is ready to merge

astrofrog commented 11 years ago

@ChrisBeaumont - I've now also updates the tests to reflect the new definition of height. I'm wondering whether we should rename it to prevent confusion for anyone who's used that variable before since it changes quite dramatically now?

ChrisBeaumont commented 11 years ago

I'm not sure what to do about height, since I don't know how many people used the old code enough to be confused by a new definition. Furthermore, this definition is what was used in the IDL code. I say we use whatever name makes most sense, and not sacrifice clarity for backwards compatibility

astrofrog commented 11 years ago

@ChrisBeaumont - once Travis passes, is this good to go? I opened #32 to remind us to create a section in the docs for people who've used previous versions of astrodendro, but otherwise I agree we should just go with what makes more sense naming-wise.

ChrisBeaumont commented 11 years ago

:+1: