christophergandrud / networkD3

D3 JavaScript Network Graphs from R
http://christophergandrud.github.io/networkD3
652 stars 268 forks source link

Sankey Diagram Will Not Plot #45

Closed kilimba closed 9 years ago

kilimba commented 9 years ago

Hello, Am relatively new to R and was trying to plot a Sankey diagram using the networkD3 library. However, all I get is a blank screen. The diagram is supposed to show the flow of infections between age groups (by gender). My code is as below:

library(RCurl)
library(networkD3)

edges <- read.csv(curl("https://raw.githubusercontent.com/kilimba/data/master/infection_flows.csv"),stringsAsFactors = FALSE )

nodes = data.frame(ID = unique(c(edges$Source, edges$Target)))

nodes$indx =0
for (i in 1:nrow(nodes)){
  nodes[i,]["indx"] = i - 1
}

edges2 <- merge(edges,nodes,by.x = "Source",by.y = "ID")
edges2$Source <-NULL
names(edges2) <- c("target","value","source")
edges2 <- merge(edges2,nodes,by.x = "target",by.y = "ID")
edges2$target <- NULL
names(edges2) <- c("value","source","target")

nodes$indx <- NULL
# Plot
sankeyNetwork(Links = edges2, Nodes = nodes,
              Source = "source", Target = "target",
              Value = "value", NodeID = "ID",
              width = 700, fontsize = 12, nodeWidth = 30)

Any help greatly appreciated. Tumaini

christophergandrud commented 9 years ago

Hm, there must be some issue in your numbering. I've been meaning to create a function that can convert R numbering to js. Just subtracting 1 doesn't do it for reasons that I haven't had time to fully work through.

timelyportfolio commented 9 years ago

I can't tell for sure but it seems that you are getting stuck in an infinite loop due to cycles (see https://github.com/d3/d3-plugins/issues/1). As a quick check, this is what I did.

library(curl)
library(networkD3)

edges <- read.csv(
  curl("https://raw.githubusercontent.com/kilimba/data/master/infection_flows.csv")
  ,stringsAsFactors = F
)

edges[,1] <- paste0("source_", edges[,1])
edges[,2] <- paste0("target_", edges[,2])

nodes = data.frame(ID = unique(c(edges$Source, edges$Target)))

nodes$indx =0
for (i in 1:nrow(nodes)){
  nodes[i,]["indx"] = i - 1
}

edges2 <- merge(edges,nodes,by.x = "Source",by.y = "ID")
edges2$Source <-NULL
names(edges2) <- c("target","value","source")
edges2 <- merge(edges2,nodes,by.x = "target",by.y = "ID")
edges2$target <- NULL
names(edges2) <- c("value","source","target")

nodes$indx <- NULL
# Plot
sankeyNetwork(Links = edges2, Nodes = nodes,
              Source = "source", Target = "target",
              Value = "value", NodeID = "ID",
              width = 700, fontsize = 12, nodeWidth = 30)

@christophergandrud, perhaps we should look at pulling in this Sankey version which can handle cycles.

timelyportfolio commented 9 years ago

To verify the existence of cycles even further, we can use igraph.

library(curl)
library(networkD3)
library(igraph)

edges <- read.csv(
  curl("https://raw.githubusercontent.com/kilimba/data/master/infection_flows.csv")
  ,stringsAsFactors = F
)

g <- graph.data.frame(edges)
is.dag(g)

Perhaps, this is another resource that can help enlighten on cycles, DAG, and Sankey.

kilimba commented 9 years ago

Thanks timelyportfolio, most helpful! An aesthetic question, after trying your updated code the display is quite crowded with spaghetti lines of links and flows. How difficult would it be to go from this two "tier" (for lack of a better word, not sure if tier is the correct term here) graph to a 3 "tier" graph. i.e from

15-19 Male -------------------------------->> 15-19 Female
15-19 Female ----------------------------->> 15-19 Male
....

to

15-19 Male ----------------------------->> 15-19 Female ------------------------->>15-19 Male
....

Is this possible? It would greatly decongest the chart and make it much clearer.

As always, thanks for the help!

christophergandrud commented 9 years ago

Thanks @timelyportfolio for all of your work on this. I'm wondering if we should wait until the Sankey fork gets merged into the master? In the meantime we could point to this issue?

timelyportfolio commented 9 years ago

I hope I did not mislead. I never intended for the above to be a full solution. Rather, it was an illustration of the problem. Let's say we can assume that the data is in order of levels. We could do something like this if my assumptions are correct (note: manual and not generalizable).

library(curl)
library(networkD3)

edges <- read.csv(
  curl("https://raw.githubusercontent.com/kilimba/data/master/infection_flows.csv")
  ,stringsAsFactors = F
)

edges[80:nrow(edges),]$Target = paste0(edges[80:nrow(edges),]$Target,"_")

nodes = data.frame(ID = unique(c(edges$Source, edges$Target)))

nodes$indx =0
for (i in 1:nrow(nodes)){
  nodes[i,]["indx"] = i - 1
}

edges2 <- merge(edges,nodes,by.x = "Source",by.y = "ID")
edges2$Source <-NULL
names(edges2) <- c("target","value","source")
edges2 <- merge(edges2,nodes,by.x = "target",by.y = "ID")
edges2$target <- NULL
names(edges2) <- c("value","source","target")

nodes$indx <- NULL
# Plot
sankeyNetwork(Links = edges2, Nodes = nodes,
              Source = "source", Target = "target",
              Value = "value", NodeID = "ID",
              width = 700, fontsize = 12, nodeWidth = 30)

The problem though is then the data out from level 2 does not sum to the data in from level 1, so you see this. Of course, this can be handled through data manipulation or it indicates that I do not understand (probably more likely).

image

image

timelyportfolio commented 9 years ago

@christophergandrud based on watching the d3-plugins repo over the years (see old pull requests), it is not maintained very well and pull requests often languish, so if we want the functionality, I think we should go ahead and use the forked version.

christophergandrud commented 9 years ago

Hm, that is good information to have. I'm just worried about how basing the package on a fork might lead to issues longer-term. Do you think this fork is likely to be easily compatible with future versions of d3-plugins?

jrowen commented 9 years ago

I also ran into this issue, and the suggestion to use is.dag was very helpful. If it's not possible to support data with cycles, I would encourage adding an is.dag check to sankeyNetwork, alerting the user that the data set is not supported and preventing the infinite loop.

christophergandrud commented 9 years ago

That seems like a reasonable suggestion. Thoughts @timelyportfolio?

timelyportfolio commented 9 years ago

@christophergandrud up to you as to how many dependencies you are willing to accept. If we update to the forked Sankey, then the user will be able to visually see what is happening and not get stuck in a loop. I could try to update the Sankey to the forked if you would like and submit a pull.

christophergandrud commented 9 years ago

@timelyportfolio This sounds like it is worth a shot.

soxofaan commented 9 years ago

hi all, I'm the author of the fork https://github.com/soxofaan/d3-plugin-captain-sankey which is mentioned here I haven't read the complete discussion here, but just wanted to inform:

christophergandrud commented 9 years ago

Thanks for the information

On Mon, Jun 22, 2015 at 2:39 PM Stefaan Lippens notifications@github.com wrote:

hi all, I'm the author of the fork https://github.com/soxofaan/d3-plugin-captain-sankey which is mentioned here I haven't read the complete discussion here, but just wanted to inform:

— Reply to this email directly or view it on GitHub https://github.com/christophergandrud/networkD3/issues/45#issuecomment-114087612 .

timelyportfolio commented 9 years ago

With pull #79, I think this is resolved so closing. Happy to reopen though if not resolved. Thanks everyone.

timelyportfolio commented 9 years ago

As I did this post, I realized I still need to do a little more to get the cycle Sankey to be complete.