cytoscape / ipycytoscape

A Cytoscape Jupyter widget
https://ipycytoscape.readthedocs.io/en/master/
BSD 3-Clause "New" or "Revised" License
263 stars 62 forks source link

Add example to create graph from pandas data frame and change style interactively #149

Open joseberlines opened 3 years ago

joseberlines commented 3 years ago

Playing around with example: https://github.com/QuantStack/ipycytoscape/blob/master/examples/Labels%20example.ipynb

I have been thinking if it would be worth to implement a solution that would create a graph out of a pandas data frame.

The nature of a graph is not easily represented as table, but for some straight forward cases might be useful since in data science we are all confronted with pandas eventually.

I can imagine a column with the names of the nodes and another column for directed edges and other columns for representation layouts (colours, etc).

Some thoughts about this are welcome.

ianhi commented 3 years ago

Hey @joseberlines happily mariana has already implemented this! Check out the DF example: https://github.com/QuantStack/ipycytoscape/blob/master/examples/DataFrame%20interaction.ipynb

joseberlines commented 3 years ago

Hi @ianhi thanks for the answer. If that is already implemented then there is a bit of info missing in the example. I am thinking about this:

Node name Label Color node Edges edge_colors Label edges Thickness edges
Mallorca MNLL red Berlin blue fly 34
Berlin BE Blue Paris green road 4
Paris PA black London,Berlin Yellow Road, train 3,4
London LO red Paris, Berlin Black road,train 4,5
Paris PA blue Mallorca, Moscow green Fly,train 6,7
Moscow MO black Berlin, Mallorca Blue road, fly 8,9

So it would be possible to play around with pandas and for instance apply functions to the whole data in order to get the coloraturas of the nodes etc.

In the example that you mentioned it is possible to see that actually a df can be the input to a ipycytoscape, that means that the pandas might contain more data for styling etc. I don't know if that is the case.

That poses some problems since plausibility checks are necessary for data compatibility.

This opens many possibilities because by combining the plotting of graphs with ipywidgets its possible to filter the graph easily using pandas functionality.

ianhi commented 3 years ago

Yeah, unfortunately the from_dataframe method currently doesn't look for every possible attribute. For example, for Nodes it only looks for what other Nodes it is connected too, and what the tooltip should say.

https://github.com/QuantStack/ipycytoscape/blob/c1e002b52fab0db65136502e5ff09371ffa5ff55/ipycytoscape/cytoscape.py#L470-L471

This is has been implemented for both JSON and networkx but not yet dataframes. See https://github.com/QuantStack/ipycytoscape/issues/64#issuecomment-645064201

joseberlines commented 3 years ago

The idea here being also for the user not having to touch any CSS or HTML code. I see ipycytoscape as part of the ipywidgets family. The idea is not needing to touch CSS and HTML and being able to deploy dashboards with voila with just one language, python.

marimeireles commented 3 years ago

Thanks for the reply @ianhi, I completely forgot about this. I reopened the issue! :) It should be an easy fix!

@joseberlines, what you mean is that you would like to set the color of the nodes and edges based on what's written on the columns, right? (That's an amazing idea! Never thought of that).

joseberlines commented 3 years ago

yes @marimeireles, in this way you can basically work with pandas and create the conditional formatting all in pandas. for instance pseudocode !!!!:

df = Dataframe
df['color'] = df[' *** whatever column ***'].apply( **** whatever function ****)
df['text'] = df[' *** whatever column ***'].apply( **** whatever function ****)

you have totally power to redefine the characteristics of the graph without getting out of the pandas world. This is actually something related to a suggestion I made related to ipysheet, https://github.com/QuantStack/ipysheet/issues/173

Same philosophy.

If relying on the user for creating a pandas that complains with all the requirements is too much, another approach would be:

pseudocode !!!!:

list_nodes = ['berlin','Barcelona','Paris']
list_text     = [P,B,P]
list_colors = [blue,red,green]
list_sizes  = [34,56,67]

cytoscapeobj = ipycytoscape.CytoscapeWidget()
cytoscapeobj.graph.add_graph_from_lists(nodes = list_nodes, texts = list_texts,colors=list_colors, sizes=list_sizes)

obviously if the Len of the list is not the same error is raised.

marimeireles commented 3 years ago

This might be a good example for people willing to try it on hacktoberfest.

joseberlines commented 3 years ago

Hi, can someone tell me about hacktoberfest? where? how? thanks

marimeireles commented 3 years ago

Hey @joseberlines sure! :) It's an online global event where people interested in contributing to tech open pull requests for open source projects. If you open 4 PRs this year you get a cool t-shirt + some stickers and mine and everybody else who uses the project gratefulness. :D Here's more info about it: https://hacktoberfest.digitalocean.com/ You're super welcome to join. I'm around if you need anything.

ianhi commented 3 years ago

@marimeireles per https://hacktoberfest.digitalocean.com/hacktoberfest-update PRs will only count if we add the hacktoberfest topic.

I was about to go add it but then noticed that no other quantstack repo has any topics, is there a quantstack policy against having topics?

marimeireles commented 3 years ago

No probs @ianhi, I missed this update. Thanks! :)

marimeireles commented 3 years ago

@ianhi I just did, actually. wasn't sure if you could do it. Thanks again! <3

joseberlines commented 3 years ago

Hi @marimeireles & @ianhi , I am still coding this idea which is taken more time than I expected. Can any of you provide me with complete set of parameters that could be handled by ipycytoscape (as pointed out in issue #175 ) minute 15.06 of the Jupiter con conference YouTube video. thx.

joseberlines commented 3 years ago

Dear all, I was about to open a discussion issue about this item but we might go on discussing it here.

So far my idea is the following pseudocode:

def make_complete_graph_from_df(nodes_df, edges_df='', class_df=''):

    # check compulsory fields in nodes
    if "id" not in nodes_df.columns:
        raise ValueError(f'"id" should be a column of the nodes DataFrame.')
    if "name" not in nodes_df.columns:
        raise ValueError(f'"name" should be a column of the nodes DataFrame.')
    if "position_x" in nodes_df.columns and "position_y" not in nodes_df.columns:
        raise ValueError(f'"position_x" in columns but "position_y" missing.')
    if "position_y" in nodes_df.columns and "position_x" not in nodes_df.columns:
        raise ValueError(f'"position_y" in columns but "position_x" missing.')

where nodes_df is a DF containing the following columns related to the attributes: all_node_attributes = ["id","idInt","parent","name","score", "position_x","position_y", "group","removed","selected","selectable","locked","grabbed","grabbable", "classes"]

and some others related to style (background colour, shape -if possible-, tooltip, etc)

the same applies for edges_df where there is a check that all the source-target connections are nodes present in the nodes df. Otherwise error is raised. The edges_df also contains columns with style characteristics of the edges (colour, thickness, label, whatever)

the method will build the graph node by note, edge by edge and build a style object added to the ipycytoscape object.

What do you think? @marimeireles @ianhi @sven5s

NOTE: this is the reason why I asked #192 in order to facilitate the node construction.