christophergandrud / networkD3

D3 JavaScript Network Graphs from R
http://christophergandrud.github.io/networkD3
650 stars 269 forks source link

no error when character vector is used for source or target in sankeyNetwork #213

Open DaveFClarke opened 7 years ago

DaveFClarke commented 7 years ago

Hi Chris, nice package in networkD3. I have been working with some data today, trying to show a path of material through a manufacturing process. I am trying to use the sankeyNetwork command and I am not getting any errors in my script, but there is no output appearing. I ran the Energy test sample on the net for networkD3 and that shows up file, so doubtful a java / HTML problem.

A couple of key differences in what I pass to the function:

  1. I pass 2 data frames directly to the sankeyNetwork function, they are not part of a list - didn't think this would be a problem

  2. The Source and Target field of the links table are the character names (chr) of the nodes and not the node indexes (num) that appears in the Energy sample as mentioned. I imagine this maybe the problem, but can you confirm? I would like to use node names rather than indexes in the function call to save me substituting nodenames for the indexes of the nodes?

There may be something else, but if you could point me in the right direction, that would be great. I have put sessionInfo, str and the sankeyNetwork command below.

Many thanks for your help,

Dave.

sessionInfo() R version 3.3.3 (2017-03-06) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] grid stats graphics grDevices utils datasets methods base

other attached packages: [1] lubridate_1.6.0 networkD3_0.3 arulesViz_1.2-1 arules_1.5-2 Matrix_1.2-8 uuid_0.1-2 [7] RODBC_1.3-14 data.table_1.10.4 timeDate_3012.100 dplyr_0.5.0 plyr_1.8.4 RevoUtilsMath_10.0.0

loaded via a namespace (and not attached): [1] mclust_5.2.2 Rcpp_0.12.9 vcd_1.4-3 mvtnorm_1.0-6 lattice_0.20-34 tidyr_0.6.1 [7] class_7.3-14 zoo_1.7-14 gtools_3.5.0 assertthat_0.1 digest_0.6.12 lmtest_0.9-35 [13] foreach_1.4.3 R6_2.2.0 stats4_3.3.3 httr_1.2.1 ggplot2_2.2.1 gplots_3.0.1 [19] curl_2.3 lazyeval_0.2.0 diptest_0.75-7 gdata_2.17.0 whisker_0.3-2 kernlab_0.9-25 [25] DT_0.2 RevoUtils_10.0.3 stringr_1.2.0 htmlwidgets_0.8 igraph_1.0.1 munsell_0.4.3 [31] base64enc_0.1-3 htmltools_0.3.5 nnet_7.3-12 tibble_1.2 gridExtra_2.2.1 seriation_1.2-1 [37] codetools_0.2-15 dendextend_1.4.0 viridisLite_0.1.3 MASS_7.3-45 bitops_1.0-6 jsonlite_1.3 [43] gtable_0.2.0 registry_0.3 DBI_0.6 magrittr_1.5 scales_0.4.1 KernSmooth_2.23-15 [49] stringi_1.1.2 viridis_0.3.4 scatterplot3d_0.3-38 flexmix_2.3-13 robustbase_0.92-7 iterators_1.0.8 [55] tools_3.3.3 fpc_2.1-10 trimcluster_0.1-2 DEoptimR_1.0-8 purrr_0.2.2 gclus_1.3.1 [61] yaml_2.1.14 colorspace_1.3-2 cluster_2.0.5 caTools_1.17.1 prabclus_2.2-6 TSP_1.1-5 [67] plotly_4.5.6 modeltools_0.2-21

str(WorkStation_Nodes) 'data.frame': 26 obs. of 1 variable: $ WorkStation: chr "CT-PS Tray Loading-SL10 PS" "CT-Insert Moulding SL10" "CT-Laser Marking-SL10" "CT-Coating Prep-SL10" ...

str(FinalLinksforSankey) 'data.frame': 137 obs. of 3 variables: $ workstation_scr : chr "CT-PS Tray Loading-SL10 PS" "CT-Insert Moulding SL10" "CT-Laser Marking-SL10" "CT-Coating Prep-SL10" ... $ workstation_dest: chr "CT-Insert Moulding SL10" "CT-Laser Marking-SL10" "CT-Coating Prep-SL10" "CT-Hydrophillic Coating-SL10" ... $ lot_number : chr "19804987" "19804987" "19804987" "19804987" ...

sankeyNetwork(Links = FinalLinksforSankey, Nodes = WorkStation_Nodes, Source = "workstation_scr",

  • Target = "workstation_dest", Value = "lot_number", NodeID = "WorkStation",
  • units = "Lots", fontSize = 12, nodeWidth = 30)
cjyetman commented 7 years ago

Try coercing the “lot_number” variable to a numeric before passing it to sankeyNetwork.

DaveFClarke commented 7 years ago

Tried it, same result: Nothing shows in the RStudio viewer

str(WorkStation_Nodes) 'data.frame': 26 obs. of 1 variable: $ WorkStation: chr "CT-PS Tray Loading-SL10 PS" "CT-Insert Moulding SL10" "CT-Laser Marking-SL10" "CT-Coating Prep-SL10" ... str(FinalLinksforSankey) 'data.frame': 137 obs. of 3 variables: $ workstation_scr : chr "CT-PS Tray Loading-SL10 PS" "CT-Insert Moulding SL10" "CT-Laser Marking-SL10" "CT-Coating Prep-SL10" ... $ workstation_dest: chr "CT-Insert Moulding SL10" "CT-Laser Marking-SL10" "CT-Coating Prep-SL10" "CT-Hydrophillic Coating-SL10" ... $ lot_number : num 19804987 19804987 19804987 19804987 19804987 ... sankeyNetwork(Links = FinalLinksforSankey, Nodes = WorkStation_Nodes, Source = "workstation_scr",

  • Target = "workstation_dest", Value = "lot_number", NodeID = "WorkStation",
  • units = "Lots", fontSize = 12, nodeWidth = 30)
cjyetman commented 7 years ago

Can you make a reproducible example so I can try it on my machine?

DaveFClarke commented 7 years ago

Sure, I can do that. I can send you some files if you give me your email? Might be tomorrow morning.

DaveFClarke commented 7 years ago

I am emailing you an R file and a sample data file. Many thanks for taking a look at it for me. Dave.

cjyetman commented 7 years ago

Here are some tips on how to make a minimal reproducible example

DaveFClarke commented 7 years ago

Thanks for that, I think what I sent you follows this. The session info for my setup is as per this thread. Let me know if there is anything else you need.

cjyetman commented 7 years ago

Your workstation_scr and workstation_dest variables need to contain the numeric indices (zero-based) of the nodes in your FinalLinksForSankey data frame, not character strings of the names.

Try this just before running sankeyNetwork()...

FinalLinksForSankey$workstation_scr <- match(FinalLinksForSankey$workstation_scr, WorkStation_Nodes$WorkStation) - 1
FinalLinksForSankey$workstation_dest <- match(FinalLinksForSankey$workstation_dest, WorkStation_Nodes$WorkStation) - 1
DaveFClarke commented 7 years ago

Great, that helps. Unfortunately, the resulting sankey is a jumble! I used SAS VA to get a sankey also. It comes up with the attached. Obviously, expensive etc, etc., but you can see the type of look I am looking for. Any ideas of whether networkD3 could do something like this or any other open source software you know of? Thanks again. sas va sankey

DaveFClarke commented 7 years ago

The following sankey matches better to the data I sent you.
sas va sankey - workstations

cjyetman commented 7 years ago

Briefly...

  1. The image you posted is clearly using different data. For instance, the first two nodes ("Lam Recipe Select..." and "Setup laser Mic-S...") do not exist in the final result of the data and the code you gave me. So, it doesn't make sense to compare the results.
  2. sankeyNetwork() plots cyclical links, which your data have many of, that's why there's so many striped links, and I assume that's why you think it looks "jumbled". If you don't want that, you should remove the cyclical links from the data before passing it to sankeyNetwork()
  3. Since this is not a bug, you'd probably be better served on StackOverflow or something like that.

We should consider breaking with an error if a user passes a character vector as the source and/or target vectors.