ericmjl / nxviz

Visualization Package for NetworkX
https://ericmjl.github.io/nxviz
MIT License
452 stars 86 forks source link

Uniform types in set_nodecolors #6

Open leotrs opened 8 years ago

leotrs commented 8 years ago

Here is an excerpt the docstring provided for BasePlot.set_nodecolors:

"""
If `nodecolors` is a `string`, all nodes will carry that color.
If `nodecolors` is a `list` or `tuple`, then nodes will be coloured in
order by the list or tuple elements.
If `nodecolors` is a `dict`, then the keys have to be all present in
the nodelist.
By default, node color is blue.
"""

Then, the code is filled with asserts according to the mentioned rules.

This approach is fine, but I'm not a huge fan of overloading a function and then forking its behavior based on its arguments. I have some experience with Mathematica, where this practice is used to the extent that built-in functions change behavior not only on the type of their arguments but also on their shape. Debugging them is not a very fun exercise.


Here is what hypothetical calls to set_nodecolors would look like:

plot = BasePlot(nodes=range(6))

# string: all the same
plot.set_nodecolors('blue')

# iterable: must preserve order
plot.set_nodecolors(['blue', 'blue', 'blue', 'blue', 'black', 'blue'])

# dict: control over each value
plot.set_nodecolors({0: 'blue', 1: 'blue', 2: 'blue',
                     3: 'blue', 4: 'black', 5: 'blue'})

What I propose is to homogenize the calls and always using dicts:

nodes = range(6)
plot = BasePlot(nodes)

# dict: setting all nodes to the same value is still easy
plot.set_nodecolors({node: 'blue' for node in nodes})

# no iterables!
# user doesn't worry about correct ordering

# dict: control over each value
plot.set_nodecolors({0: 'blue', 1: 'blue', 2: 'blue',
                     3: 'blue', 4: 'black', 5: 'blue'})

The main benefits of this approach is that the code is much shorter and less error-prone, as there are far fewer type-checks and forks that need to be performed at each call (do we only check the type of the argument, or also the length? What about the validity of each color string?) Also, the user doesn't need to worry about the order of iterables being the same as the nodecolors object they are passing.

Right now, even if only one string is being passed, the method ultimately performs self.nodecolors = [nodecolors] * len(self.nodes), which means passing a dict of size equal to the nodelist even in the simplest case will not incur in more memory cost.

I know everything is yet in early stages, but I thought standardizing function signatures is an important design decision to be made from the start.

leotrs commented 8 years ago

Is there an interest in this? Maybe it's too early in development to think about this issues.

ericmjl commented 8 years ago

@jonchar is our resident matplotlib guru - what's your thoughts?

ericmjl commented 8 years ago

by the way, @leotrs, now that I've found some time to look carefully at what you're proposing, it's a great idea. would you like to start a PR + tests for this?

leotrs commented 8 years ago

@ericmjl This came up in the course of reviewing the code to prepare a PR.

Do we want to keep the set_* functions, or do we want (more pythonic) properties?

ericmjl commented 8 years ago

If you can get it done within the PR and the examples run correctly, then I've no problems with changing to properties.

jonchar commented 8 years ago

Just a few thoughts:

  1. It is standard in the matplotlib API to see plotting functions that take kwargs of different types to set attributes of the plot. See the the c argument in plt.scatter for an example of this and the source for how it's handled. Under the hood (in draw_nodes in circos.py) we manually add patches to the axes object but I was thinking perhaps we could use plt.scatter instead to leverage this functionality.
  2. I think having set_nodecolors and set_edgecolors behave in this fashion (with their corresponding kwargs) would keep things streamlined with the matplotlib API. This would also keep a low barrier to customization if you already know matplotlib. If we can offload handling these parameters to existing matplotlib functions as well, that would be advantageous.
  3. The set_* methods are currently called internally when the plot is created. I'm not sure of a use case where they would need to be called after the plot is already created a-la BasePlot.set_*()
  4. Since matplotlib is our "backend" so-to-speak, any valid matplotlib color would be valid in nxviz.
leotrs commented 8 years ago
  1. If we do follow matplotlib standards, we have to think about how this allows or forbids us to follow the proposed declarative design (#2). Having said that, I agree it's a good idea.
  2. I propose to follow matplotlib purposefully, especially so that we can

    offload handling these parameters to existing matplotlib functions as well

    That means we don't have to make the type checks ourselves.

  3. Whether or not the set_* methods are called after plot creation, having getters and setters is generally frowned upon in python. That's what properties are for. Not trying to be confrontational, just making an observation.
ericmjl commented 8 years ago

great thoughts, guys!

on plt.scatter @jonchar: I think using plt.scatter is a cool idea too. In other words, all we have to do is compute the (x, y) coordinates for each point, and pass in the xs and ys, rather than iterate and add patches individually. That will definitely make the underlying code cleaner, and will allow us to separate the logic that draws to screen from the logic that computes x-y coordinates.

*on `set_methods** I'm conflicted on whether or not to remove them explicitly, now that I've heard arguments both ways. On one hand, _right now_, we are usingmatplotlibas the backend. But a declarative API will mean that we should be able to swap inbokeh,plotly` etc. Let's see what your PR looks like, @leotrs.

on type checking As for type checking... it's the same argument. Right now we're using matplotlib, so it makes sense to have matplotlib do the type checking. I added those assertion statements in the hopes that the error messages would be more informative for the end-user than a generic matplotlib one would be. I guess at this point it's a bit more of a front-end design choice - are we designing for users familiar with matplotlib at this point?

jonchar commented 8 years ago

@ericmjl: I can put together a PR for using plt.scatter to see what that would look like.

@leotrs: You're right, properties are the way to go. The existing set_* methods are leftover from the initial implementation.

leotrs commented 8 years ago

@ericmjl Just wanted to check-in. I still plan on implementing this, hopefully soon. As I told you, I just moved to Boston and started grad school, and I'm still adjusting. But a couple of hours should clear up soon-ish, and I'll get back to work on this!

ericmjl commented 8 years ago

That'd be awesome, @leotrs! Hope you're doing well right now at NEU. Let's see how both of our next set of updates shape up and see how we can integrate them!

ericmjl commented 7 years ago

@jonchar @leotrs FYI I merged some API changes into master. Not the best practice right now, as I should have done a dev branch first, but it'll do for now. In case you guys are still working on PRs, don't forget to pull in the latest updates!