christophergandrud / networkD3

D3 JavaScript Network Graphs from R
http://christophergandrud.github.io/networkD3
649 stars 270 forks source link

adding names to node connections from data frame in R #266

Closed amjass12 closed 4 years ago

amjass12 commented 4 years ago

Hello,

I am having a little bit of a frustrating problem making a sankey network. I have a data frame of 100 columns which effectively represent values of activation levels of samples (rows). A min reproducible example looks like:

`

organ Age 0 1 2 3 4 5
Kidneys p3 0 9.255803 0 0 1.421899 0.828556
Kidneys p3 0 9.43768 0 0 0 0.87259
Kidneys p15 0 8.447323 0 0 5.223955 0
Kidneys p15 0 9.145821 0 0 0 0.913325
brain p3 0 7.277491 0 0 0.318774 2.85587
brain p3 0 7.283442 0 0 0 3.396625
brain p15 0 8.866594 0 0 0 0
brain p15 0 8.498539 0 0 4.863481 0.208462

`

I would like to make a sankeyNetwork that shows the connection between the organ and the values in columns 1-100 (1-5 shown in the example above).. The reason for using this is i can see which organs go to which column (so for example kidneys might go to 1 2 and 3 and not 4 and 5 etc... this is a clear way of visualizing this.

However I am having an issue when carrying out the following code:

`links <- mat[,3:100] %>% as.data.frame() %>% rownames_to_column(var="source") %>% gather(key="target", value="value", -1) %>% filter(value != 0)

From these flows we need to create a node data frame: it lists every entities involved in the flow

nodes <- data.frame( name=c(as.character(links$source), as.character(links$target)))

With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.

links$IDsource <- match(links$source, nodes$name)-1 links$IDtarget <- match(links$target, nodes$name)-1

Make the Network

p <- sankeyNetwork(Links = links, Nodes = nodes, Source = "IDsource", Target = "IDtarget", Value = "value", sinksRight=FALSE) `

The works fine, however I am unsure about where to to specifiy the organ and age labels... the plot that is produced is attached. There is an issue because it takes al 140 samples as oppose to grouping them in to their organs or ages (and i would also like this labelled as the age or organ and not numbers)

sankey

As you can see it looks like a big mess as it is the individual files names as oppose to the groups.

I have tried name=c(as.character(mat$organ), as.character(links$target)))

etc, but to no avail. please could you push me in the right direction on how to correct this problem?

further questions: is there a way of labeling both organ and age?

The plot above is of the data frame itself. however, I would like to use the dist and hclust function to form the columns that are closest to each other.. is there a way of plotting them the sankey diagram with the samples that are closest to each other? dendogram attached for how i would like the connections to be next to each other... circle plot.pdf

I appreciate any help and am happy to provide any furthe data/code etc

thank you in advance!!

cjyetman commented 4 years ago
  1. You probably need to use unique() to build the list of unique node names.
  2. You must set the NodeID parameter for it to work properly (there is no default value).
  3. If you want organ and age to define the nodes, you will have to combine those two into one.
library(tidyverse)
library(networkD3)

mat <- tibble::tribble(
   ~"organ", ~"Age", ~"0",     ~"1", ~"2",   ~"3",     ~"4",     ~"5",
  "Kidneys",   "p3",    0, 9.255803,    0,      0, 1.421899, 0.828556,
  "Kidneys",   "p3",    0,  9.43768,    0,      0,        0,  0.87259,
  "Kidneys",  "p15",    0, 8.447323,    0,      0, 5.223955,        0,
  "Kidneys",  "p15",    0, 9.145821,    0,      0,        0, 0.913325,
    "brain",   "p3",    0, 7.277491,    0,      0, 0.318774,  2.85587,
    "brain",   "p3",    0, 7.283442,    0,      0,        0, 3.396625,
    "brain",  "p15",    0, 8.866594,    0,      0,        0,        0,
    "brain",  "p15",    0, 8.498539,    0,      0, 4.863481, 0.208462
  )

links <- 
  mat %>%
  mutate(organ_age = paste(organ, Age)) %>% 
  gather(key="target", value="value", -c(organ, Age, organ_age)) %>%
  filter(value != 0)

# organ
nodes <- data.frame(
  name = unique(c(as.character(links$organ), as.character(links$target))))

links$IDsource <- match(links$organ, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1

sankeyNetwork(Links = links, Nodes = nodes,
              Source = "IDsource", Target = "IDtarget",
              Value = "value", NodeID = "name", 
              sinksRight=FALSE)

# age
nodes <- data.frame(
  name = unique(c(as.character(links$Age), as.character(links$target))))

links$IDsource <- match(links$Age, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1

sankeyNetwork(Links = links, Nodes = nodes,
              Source = "IDsource", Target = "IDtarget",
              Value = "value", NodeID = "name", 
              sinksRight=FALSE)

# organ and age
nodes <- data.frame(
  name = unique(c(as.character(links$organ_age), as.character(links$target))))

links$IDsource <- match(links$organ_age, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1

sankeyNetwork(Links = links, Nodes = nodes,
              Source = "IDsource", Target = "IDtarget",
              Value = "value", NodeID = "name", 
              sinksRight=FALSE)
organ Age organ_age
amjass12 commented 4 years ago

Thank you so much for your help! this works perfectly... The other alternative i see if to have organ go to age and then age go to the respective nodes at the end... Is there a way to achieve this?

cjyetman commented 4 years ago
library(tidyverse)
library(networkD3)

mat <- tibble::tribble(
  ~"organ", ~"Age", ~"0",     ~"1", ~"2",   ~"3",     ~"4",     ~"5",
  "Kidneys",   "p3",    0, 9.255803,    0,      0, 1.421899, 0.828556,
  "Kidneys",   "p3",    0,  9.43768,    0,      0,        0,  0.87259,
  "Kidneys",  "p15",    0, 8.447323,    0,      0, 5.223955,        0,
  "Kidneys",  "p15",    0, 9.145821,    0,      0,        0, 0.913325,
  "brain",   "p3",    0, 7.277491,    0,      0, 0.318774,  2.85587,
  "brain",   "p3",    0, 7.283442,    0,      0,        0, 3.396625,
  "brain",  "p15",    0, 8.866594,    0,      0,        0,        0,
  "brain",  "p15",    0, 8.498539,    0,      0, 4.863481, 0.208462
)

links <- 
  mat %>%
  gather(key="target", value="value", -c(organ, Age)) %>%
  filter(value != 0)

links <- data.frame(source = c(links$organ, links$Age),
                    target = c(links$Age, links$target),
                    value = c(links$value, links$value))

nodes <- data.frame(
  name = unique(c(as.character(links$source), as.character(links$target))))

links$IDsource <- match(links$source, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1

sankeyNetwork(Links = links, Nodes = nodes,
              Source = "IDsource", Target = "IDtarget",
              Value = "value", NodeID = "name", 
              sinksRight=FALSE)
amjass12 commented 4 years ago

this is perfect!! thank you so much for taking the time to help me with this! i truly appreciate it :)

thanks!