What's the right API? Declarative or flexible?

ericmjl commented 8 years ago

I've been thinking a bit about altair, and how they have a very nice declarative grammar for statistical plots. I think the same can be achieved for nxviz.

Thinking first from the user-facing API, I think a good starting point is to expose something like:

b = BasePlot(G, node_color_by='key_name', edge_color_by='key_name')

Under the hood, we can iterate through all nodes and all edges and their metadata, and use sensible defaults to identify a colour keyword/hexdecimal/RGB value using the matplotlib colour maps.

A dictionary of key-value mapping needs to be stored with the object, such that the keys refer to the metadata value, and the value refers to the keyword/hexdecimal/RGB value.

On the other hand, I can see how this can easily get really messy... the API has to assume as many categories as nodes/edges, and this can mean a really messy colouring/visual. The current API as it stands leaves these details to the user to figure out, and the plotting object only takes care of laying out nodes and edges properly. May be good to stick with this simple case first.

jonchar commented 8 years ago

I like the idea of a declarative API (and discussing the API from the start!). From what I understand a declarative API means the user writes code describing what they want, rather than how to do it (imperative). Thus I think it would work out best if we start with a clear path between basic declarations and what they mean. Are there any common practices out there?

Building on your BasePlot snippet, maybe it makes sense to use key_name to define groupings and separate them from the corresponding visual component (color, linewidth, etc.). Then the user could say "I want my node groups to be different colors and my edge groups to be different linewidths" like so:

b = BasePlot(G, node_grouping=('key_name', 'color'), edge_grouping=('key_name', 'linewidth'))

We could use matplotlib's cycler API under the hood.

I also agree this could get messy too. We'd need a way to determine if the values under the keyed attribute are compatible with the visual element being styled (e.g. a continuous-valued attribute is compatible with a sequential colormap but not a linestyle).

ericmjl commented 8 years ago

Those are great thoughts, @jonchar.

Do you think a declarative API can be implemented alongside an imperative API? I'm struggling to figure out how that might work. Right now, the current APIs of the old Circos and Hive Plots, which I'm right now copying over as a first step, are imperative APIs.

jonchar commented 8 years ago

@ericmjl I could see having both if declarative API calls map directly to imperative calls that use defaults we choose. Optional imperative statements could also be specified that would then override our defaults (e.g. by adding node_cmap='cmap' to the above call). Perhaps we could start with this kind of approach?

My instinct is to think of declarative APIs as balancing out flexibility. Finding the right balance will probably become apparent over time.

ericmjl commented 8 years ago

@jonchar Those look like great ideas!

Right now, the BasePlot implementation provides a way for uniformly colouring all nodes and all edges (the node/edgeprops) and a way for colouring individual nodes and edges (the node/edgecolors). I have a hunch that the declarative API might look different for different kinds of plots, but I'm not quite sure; it might need a whiteboard session to figure this out.

Will you be at the next Boston Python project night? Let's hack on this then?

jonchar commented 8 years ago

Definitely, let's hack then!

leotrs commented 8 years ago

Not a contributor yet, but plan to be. I hope it's OK to chip in.

A declarative interface sounds interesting and something I would be looking forward to implementing. However, if the end goal is to have nxviz merged into NetworkX, it might be good to check out NetworkX's api and try to mirror it, so as to have better integration.

ericmjl commented 8 years ago

@leotorr welcome aboard! I'd love to see some PRs from your side as this visualization package progresses, and if they are looking good, I'd be happy to add you as a direct collaborator later on.

On the NetworkX API side, there's a bit of a back story. I had a chat with Aric Hagberg (lead developer of NetworkX) at SciPy 2016 about integrating some visualizations. I think he approves of the "plotting object" design as it stands right now, so what might happen is that we end up passing in a Graph Object into a Plotting Object, and let the user use a "delcarative" approach. This is all up for re-designing nonetheless; the goal is to easily enable users to create fancier network plots beyond mushed up hairballs.

leotrs commented 8 years ago

@ericmjl thanks!

I see what you are saying, and I agree that it's best to separate the graph topology code and the graph visualization code. I guess what I'm saying is that, if users of NetworkX are used to an imperative grammar with nx's Graph objects, would it be natural to expect them to use a different philosophy for the Plot object? Again, I'm all for trying that myself, but I can't speak for anybody else.

What @jonchar says is true though, if there is a mapping from declarative to imperative methods, then this should be less of a problem.

(Also, btw, I'll be moving to Boston this Fall, would love to meet up my if I get around to contributing more to this project.)

ericmjl commented 8 years ago

@leotorr welcome to Boston! Yes, we'd love to hack with you on this. What will you be doing in the Boston area, btw?

leotrs commented 8 years ago

Network Science PhD. That would explain my interest in this project.

ericmjl commented 8 years ago

Cool stuff. Welcome to Boston, and hope you can join us at the Boston Python meetups (they're usually near the MIT campus) to hack on nxviz!

On Thu, Jul 28, 2016 at 8:59 AM Leonardo notifications@github.com wrote:

Network Science PhD http://www.networkscienceinstitute.org/. That would explain my interest in this project.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ericmjl/nxviz/issues/2#issuecomment-235886956, or mute the thread https://github.com/notifications/unsubscribe-auth/ACgnjvl99buhucL67z-mfTfBrU67X0WHks5qaKefgaJpZM4JOVRw .

ericmjl commented 8 years ago

I gave this issue of a declarative API a bit more thought.

I think the key thing I noticed here is that there are key parameters in rational network viz that can be controlled, and that they should map onto some kind of data. Building off @jonchar's API spec, if we go by some key, then it would make sense to do some data checking. For the distribution of data keyed by that key, here's what I think would make most sense for data checking:

node_order: something sortable (e.g. quantitative, ordinal, or categorical by alphabet)
node_size: quantitative data only
node_grouping: 'discretely' separable data, so categorical or ordinal are both good.
node_colour:
- quantitative:
- divergent
- sequential
- categorical - up to 12 bins (as done by colorbrewer)
edge_width: quantitative only, it's a size mapping.
edge_colour: as per node_colour, both quantitative and categorical data are usable here.

I based these ideas mostly off this Points of View column series by Bang Wong (Broad Institute) & Martin Krzywinski (BC Cancer Research Centre), but if I'm missing something, please post!

ericmjl commented 8 years ago

After talking with @jonchar, in order for the API design to be of the form:

c = CircosPlot(graph=G, node_order='kw1', node_size='kw2', node_grouping='kw3', node_colour='kw4', edge_width='kw5', edge_colour='kw6', data_properties=some_dictionary)

the data has to be checked first to make sure that the data keyed by the node and edge metadata keywords (kwX) fit the type of data that can be expressed by the drawing property (order, size, grouping, as per above's comment). Probably best to implement the data checking functionality on the BasePlot() object so that it's available for all children instantiated. These data checks should also be stored as an attribute, so that they can be passed to other plotting objects if needed.

The structure of the plot.data_properties attribute could be a dictionary of dictionaries:

{node_kw:{kw1:'categorical', kw2:'quantitative', kw3:'ordinal',... },...}

Providing the data_properties keyword argument will allow the end user to provide this up-front, thus allowing us to bypass the checks, and raise loud errors if they don't turn out to match.

ericmjl commented 8 years ago

Hey guys, I'm going to start a fork that implements this new API, as it turns out I'm going to begin using nxviz for a DataCamp contribution I'm making. Just wanted to let everybody know about it!

ericmjl / nxviz

What's the right API? Declarative or flexible? #2