Loading large graphs cause browser to halt

clemsos commented 9 years ago

I have imported this graph (~5000 edges and ~3500 edges). When I try to display it, top shows my browser (chrome) show use to 100.4% CPU use

To reproduce

create a new network
click on import data
copy/paste CSV and select nodes then id column : https://github.com/devsar/d3talk/blob/master/airports.csv
copy/paste CSV and select edges then source/target columns : https://github.com/devsar/d3talk/blob/master/flights-airport.csv
go to the URL of the graph

clemsos commented 9 years ago

After a few seconds the graph is loaded and displayed

clemsos commented 9 years ago

@Soupala @maxkfranz : any idea for this ?

Soupala commented 9 years ago

@clemsos

I just stumbled upon this D3 project today: http://discograph.mbrsi.org/artist/270025. All of Disco/graph's data is derived from the Discogs.com discography database: nearly 4 million artists, 1 million labels, and 7 million releases creating a network of nearly 70 million different relationships.

Only 100 nodes are rendered at a time, however. Sort of like the pagination idea.

The view is always based around a node of interest or a random node. The user chooses an artist, either at random or by a search for an artist he/she is interested in. By doing this, there is always a filter. Then, as the user double clicks on an item he/she is interested in, there is a refresh of the data based around that.

Also he is using Redis as part of his implementation. See https://github.com/josiah-wolf-oberholtzer/discograph/blob/master/discograph/DiscographAPI.py for how he slices and dices.

Maybe some ideas here that are useful.

maxkfranz commented 9 years ago

@clemsos

For thousands of elements, showing neighbourhood like @Soupala's suggestion makes sense. You can forgo redis and just have all elements in memory.

Highly connected graphs are an issue, because any node's neighbourhood is almost the whole graph. An additional filter by score/weight/pagerank may help to cut down the size.

maxkfranz commented 9 years ago

Another option is instead of filtering absolutely to filter relatively. For example, you could still have a node of interest -- but instead of filtering the neighbourhood you could use the score/weight/pagerank to sort the neighbourhood. That would work well with the concentric layout.

Then you could filter which elements are visible based on the zoom level to cut down the number of elements on screen at any time. The more you zoom in, the more elements are visible.

Basically, you want to cut down on the number of elements visible when you're zoomed out completely. When you zoom in, the viewport automatically limits how many elements are on the screen.

clemsos commented 9 years ago

The approach of relative filtering based on neighborhood is interesting. For Meteor, it should update server-side subscription based on visible nodes IDs sent by the client.

client has visible nodes
IDs are sent to the server
the server calculate neighborhood
the server send back neighborhood nodes to the client

A question : How do we select nodes on the client-side ? This should support user action (click, drag etc.) but also should be updated based on viewport and zoom/resize. One problem is if you zoom out you will have to compute tons of nodes again. Maybe we could merge nodes automatically in groups on the server and send clusters to the clients instead of single nodes. that way the graph stay readable and it is not so heavy to load and display.Then a user can unpack part of the graph, like shown here http://bl.ocks.org/mbostock/1062288

About graph calculations server-side : can cytoscape be used for this? Having Neo4J makes complete sense, but I don't know how well it will integrate with Meteor.

clemsos / cyedit

Loading large graphs cause browser to halt #2