YuLab-SMU / ProjectYulab

:next_track_button: Small coding tasks that enable you to participate in our development
33 stars 3 forks source link

finding a tree on a network #10

Closed GuangchuangYu closed 11 months ago

GuangchuangYu commented 1 year ago

We can find a tree (shortest path tree (SPT) or minimum spanning tree (MST)) on a network.

This will help us to focus on an interested node (as root) and its impact on the network.

image

https://www.science.org/doi/10.1126/science.1245200

xiangpin commented 1 year ago

I think we can find the min distances among the nodes of a network, then we can obtain the clusters of the nodes.

library(igraph)                                                                                                                                                                                                    library(tidygraph)
library(ggtree)
library(ggraph)
library(treeio)

set.seed(1234)
g3 <- play_erdos_renyi(10, 0.5) %>%
  activate(nodes) %>%
  mutate(degree = centrality_degree(), name=seq_len(10)) %>%
  activate(edges) %>%
  mutate(centrality = centrality_edge_betweenness()) %>%
  arrange(centrality)

find.spt <- function(
  x,
  mst = FALSE,
  weights = NULL,
  hclust.method = 'average',
  ...
  ){
  weights <- rlang::enquo(weights)
  wg.nm <- NULL
  if (!rlang::quo_is_null(weights)){
     edge.attr.names <- igraph::edge_attr_names(x)
     if (rlang::as_name(weights) %in% edge.attr.names){
         wg.nm <- rlang::as_name(weights)
         weights <- igraph::edge_attr(x)[[wg.nm]]
     }else{
         weights <- NULL
     }
  }else{
     weights <- NULL
  }
  if (mst){
     x <- igraph::mst(x, weights = weights)
     if (!is.null(weights)){
         weights <- igraph::edge_attr(x)[[wg.nm]]
     }
  }

  d <- igraph::distances(x, weights = weights, ...)

  tr <- ggtree:::as.phylo.hclust2(
          hclust(as.dist(d), method = hclust.method),
          hang = .1
        )
  return(tr)
}

p1 <- ggraph(g3, layout='kk') +
     geom_edge_link() +
     geom_node_point(color='orange', size=4) +
     geom_node_text(aes(label=name))

p2 <- ggraph(mst(g3), layout = 'kk') +
      geom_edge_link() +
      geom_node_point(color = 'orange', size = 4) +
      geom_node_text(aes(label = name))

tr1 <- find.spt(g3, hclust.method = 'ward.D')
tr2 <- find.spt(g3, mst = T, hclust.method = 'ward.D')

f1 <- ggtree(tr1) +
      geom_tippoint(color='orange', size = 4) +
      geom_tiplab()

f2 <- ggtree(tr2) +
      geom_tippoint(color='orange', size = 4) +
      geom_tiplab()

f3 <- ggtree(as.phylo(mst(g3)), layout='rad') +
      geom_nodepoint(color='orange', size = 4) +
      geom_tippoint(color='orange', size = 4) +
      geom_nodelab(node = 'all')
aplot::plot_list(p1, p2, f1, f2, f3, ncol=2, nrow=3, widths=c(1, 1))

捕获 Figure A is the original graph. Figure B is the minimum spanning tree graph (has the smallest sum of edge weights) of Figure A. Figure C is the clustering tree based on the length of all the shortest paths of Figure A graph. Figure D is the clustering tree based on the length of all the shortest paths of Figure B graph. Figure E is the same to Figure B (the minimum spanning tree graph of Figure A), but using the as.phylo of treeio to convert it to phylo, then visualize it using ggtree.

GuangchuangYu commented 1 year ago
spt <- function(g, from, to, ...) {
    sp <- shortest_paths(g, from, to=to, ...)
    edges <- lapply(sp$vpath, function(x) {
        p <- as.numeric(x)
        if (length(p) == 1) return(NULL)
        lapply(1:(length(p)-1), function(i) {
            data.frame(from=p[i], to=p[i+1])
        }) |> do.call(rbind, args = _)
    }) |> 
    do.call(rbind, args = _) |>
    unique()
    as.phylo(edges)
}

x <- spt(g3, 6, V(g3)) 
ggtree(x, layout='rad') +
    geom_nodepoint(color='orange', size = 4) +
    geom_tippoint(color='orange', size = 4) +
    geom_nodelab(node = 'all')

image