archivesunleashed / graphpass

GraphPass is a utility to filter networks and provide a default visualization output for Gephi or SigmaJS.
https://archivesunleashed.org/
Other
17 stars 2 forks source link

Missing nodes in GEXF output files (on AUK) #25

Closed ianmilligan1 closed 6 years ago

ianmilligan1 commented 6 years ago

Describe the bug On a few collections, we have sigma js network visualizations that have misplaced nodes. Edges point to blank space, and nodes are hovering arbitrary. Here's an example from sigma:

screen shot 2018-08-23 at 10 00 56 am

The same thing appears when the GEXF file is opened in Gephi:

screen shot 2018-08-23 at 9 57 40 am

The original GRAPHML file (pre GraphPass transformation), however, has edges properly connected to nodes.

Something is going awry with GraphPass, presumably in the x/y placement of nodes.

To Reproduce Find a broken collection. There doesn't seem to be a universal rhyme or reason for why this happens.

@greebie I will send you the before/after file of the collection above, so you can work on GraphPass to fix it.

Expected behavior Edges and nodes should connect. 😄

Desktop/Laptop (please complete the following information): @ruebot and I have reproduced this on Safari, Chrome, and Firefox on both Linux and Windows. The Gephi test pretty firmly indicates that it is a GraphPass related issue.

greebie commented 6 years ago

These appear to be nodes that only link to themselves. For example, search for "hermanleonard.com" in the test example. Gephi handles these by providing a loop back representation (shown above), sigma does not support self-links. However, apparently switching the edge type to "curved" looks more attractive and may resolve the problem. cf

There are also plugins available to support self-referential links.

It is possible to remove self-referential links in Graphpass completely if desired.

ianmilligan1 commented 6 years ago

Why are the edges pointing at blank space then?

greebie commented 6 years ago

Okay - after further exploration, it looks as if some nodes are collecting negative sizes. This probably happens when Graphpass tries to figure out a reasonable sizing pattern for the nodes. I will try to resolve tomorrow.

greebie commented 6 years ago

Looks like I am using the node count for the graph size and that creates problems when the max node sizes - min node size value is equal to the node size. Will switch to the number of edges instead.

greebie commented 6 years ago

Okay -- here is the full explanation of the bug, complete with mathematics. :)

Because our network outputs in aut contain websites with lots and lots of links and others with very few, it can be difficult to visualise the outputs in gephi or otherwise using the total links. For instance, it's common to get this:

image

In order to make it possible to view the nodes together in a more visually appealing way, it's common to use a scale of some sort. You could take the square root of each number, for instance, or cut the nodes in half, so that a node of size 1000 goes to 500 while a node of size 2 goes to 1.

The calculation I used was to multiply every node by log10( total number of nodes in the graph / (degree of largest node - degree of smallest node)). This worked fine when the denominator of the function was less than the total nodes, but failed when it was larger because the scale would all be negative numbers. This meant that sigma had no basis on which to scale.

Basically, I was fooled into believing this approach worked generally because sigma did its own massaging.

The new approach will be more correct and will provide an attractive output for both sigma and gephi (or any other visualisation tool).

It uses the following:

MAX_SCALE_VALUE * ((log(x +1) - log(minimum +1)) / (log(maximum +1) - log(minimum +1))

where

Each of x, minimum and maximum are increased by one to avoid log(0) which is undefined.

greebie commented 6 years ago

Using the new formula, this is what the same graph looks like in Gephi.
image

However, we should do some serious testing in sigma to make sure it works properly.