clustered/collapsed views for large graphs

kohlhase commented 7 years ago

Sometimes, the graph is just too big for TGView to handle, .e.g. for the PVS NASA library, and - by extension - for all of MathHub. It would be good if we could have a mode of operation in TGView, where instead of trying to layout a large graph, MMT computes a clustered graph which is then shown by TGview, and where the user can expand the respective clusters one-by-one. A trivial example where we already have a clustering is the overall MathHub Graph, where we have 34 groups, which could be shown in a graph and then expanded. Similarly, when clicking on a group (in the left menu) we could show a graph of all the repositories in that group, here we even have the edges in form of the inter-repository dependencies in the MANIFEST.MF.

For the NASA library (which is in one repository) this does not help, since we do not have a clustering a priori. Here we could think of running a clustering algorithm on the graph (but what would we call it? Or we could (try to) cluster the graph by the source file names - @Jazzpirate does this make sense? Or we could generally allow to cluster by the narrative structure.

Note that we would not need to transfer the full clustered graph immediately, but could load the respective subgraph when it is expanded. This could make things more effective.

I would like to use this issue for discussion first, and then see whether we can come up with a plan that makes sense. I think the proposed interaction makes a lot of sense from the user side.

Shadow992 commented 7 years ago

It would be much better to cluster these graphs client side, because this makes everything regarding cluster "reopening" easier. So I would suggest that MMT is sending data as always for every node, but also some "Meta-Information" like information about what to cluster.

We have to think about a few "advanced" client-side cluster algorithms for Javascript. A good first "heuristical cluster method" may be combining the current "Standard-Layout" with easy to implement but really fast algorithms like DBSCAN ( https://de.wikipedia.org/wiki/DBSCAN ).

This may not also offer faster processing speed (because we do not have to draw all nodes), but also a better Look&Feel through "natural clustering".

Shadow992 commented 6 years ago

Concrete steps based on todays discussion:

Implement lazy loading for (sub-)nodes in tgview
Offer possibility to cluster graphs by ClusterID/ClusterName (Question: Overlapping clusters? Maybe use regions for clustering: https://github.com/UniFormal/TGView/issues/39)

UniFormal / TGView

clustered/collapsed views for large graphs #15