Neilos / bihisankey

A d3 javascript library/plugin for drawing bi-directional hierarchical sankey diagrams
68 stars 24 forks source link

YsacleFactor and YPosition #2

Open millerbd200 opened 9 years ago

millerbd200 commented 9 years ago

When defining a large data set of nodes with many children and sub children nodes (>10,000: a parent can have 20 children and each of those children can have 20 and so on), if you want to add links in small amounts, say connecting great-great-great grandchildren 10 at a time. The nodes and links are created but they are placed below the the HEIGHT of the svg. Should there be a check to make sure that the lowest y position is not outside the HEIGHT.

Neilos commented 9 years ago

Wow that's a lot of nodes, but I don't think that is the cause of your problem.

What causes the diagram to be rectangular rather than more square

Because sankey diagrams are all about the representation of connections between nodes, I've prioritised the display of high connection diagrams, by compressing diagrams as much as possible in the x direction. I think this is the right thing to do for most sankey diagrams, but the consequence is that a diagram with a low ratio of connections to nodes will have a large height relative to its width. See below for a diagrammatic explanation.


High ratio of connections to nodes tends towards a landscape orientation

:black_large_square: :arrow_right: :black_large_square: :arrow_right: :black_large_square: :arrow_right: :black_large_square: :black_large_square: :arrow_right: :black_large_square: :arrow_right: :black_large_square: :arrow_right: :black_large_square:


Low ratio of connections to nodes tends towards a portrait orientation

:black_large_square: :arrow_right: :black_large_square: :black_large_square: :arrow_right: :black_large_square: :black_large_square: :arrow_right: :black_large_square: :black_large_square: :arrow_right: :black_large_square:


What determines the diagram height

You'll notice that the diagram already scales the node heights and link thicknesses to fit them all onto the page (where possible), but it doesn't scale the node spacing, which is set by the developer.

Suggestion 1: Try specifying a lower node spacing to fit more nodes on the page. biHiSankey.nodeSpacing(1);

Why the diagram doesn't adjust to fit contents.

You will notice that the diagram does not scale the diagram container on the page to fit the contents but rather scales the contents (where possible) to fit the diagram containers height and width as specified by the developer (biHiSankey.size(300, 400);).

The reason it works that way is to prevent breaking the layout of the page as a whole.

Suggestion 2: If your page can accommodate any height then you could increase the specified height of the diagram container to allow more nodes to fit on. This may, however, result in large empty spaces while the child nodes remain collapsed.

To calculate the height of the diagram responsively try something like the following (not tested):

// Initialize the sankey diagram in the normal way
d3.biHiSankey()
  .nodes(exampleNodes)
  .links(exampleLinks)
  .initializeNodes(function (node) { // do whatever })
  .layout(LAYOUT_INTERATIONS);

// Group the nodes into columns
var nodesInColumns = d3.nest()
  .key(function (node) { return node.x; })
  .sortKeys(d3.ascending)
  .entries(d3.sankey().nodes())
  .map(function (object) { return object.values; });

// Calculate the new diagram height
// by summing the node heights and node spacings in each column
// and use the largest sum
var newDiagramHeight = d3.max(nodesInColumns, function (columnNodes) {
  var sumOfNodeHeights = d3.sum(columnNodes, function (node) {
    return node.value;
  });
  var spaceBetweenNodes = (columnNodes.length - 1) * d3.sankey().nodeSpacing();
  return  sumOfNodeHeights + spaceBetweenNodes;
});

//...and then layout with the new dimensions:
d3.biHiSankey()
  .nodes(exampleNodes)
  .links(exampleLinks)
  .size([WIDTH, newDiagramHeight])
  .layout(LAYOUT_INTERATIONS);

Generating sankey diagrams for lots of different data sets is necessarily a balancing act with competing priorities. I think the priorities I've chosen are right ones for the majority of cases (though apparently not in your case).

Hopefully the above suggestions helped and I've explained my thinking and reasons for why things are as they are currently. If you find another solution feel free to submit a pull request and I'll happily take a look.