briatte / ggnet

Network visualization with ggplot2
https://briatte.github.io/ggnet/
194 stars 33 forks source link

bipartite #3

Closed pedroj closed 10 years ago

pedroj commented 11 years ago

thanks François! It would be great to have specific functions to handle bipartite networks. With the exception of the bipartite package most existing R packages do not include simple ways to plot bipartite structures. I'd suggest a quick way to code the two modes of the bipartite network, with distinct shape/colour combinations for instance.

briatte commented 11 years ago

I have no experience at all with bipartite networks, but what you are suggesting seems perfectly doable. If the ggnet function gets integrated, as I hope it will, into the GGally package, my next release will offer what you suggest. For now, I unfortunately do not have the time to work on any major new feature.

May I also suggest that you take a look at the gggraph function mentioned in issue #2? It seems to work with bipartite networks :)

briatte commented 11 years ago

Hi Pedro,

I have added some bipartite scripts to a network example to show how the function can work with some bipartite networks, like the one below. The main trick for me was to produce the network through matrix algebra. Red points are bills and blue points are MPs who collaborated on the bills.

This graph, however, uses only a small sample (< 5%) of the data that I have produced: I still have to find a way to "flatten" larger graphs with multiple edges into an unweighted adjacency matrix. Dabbling with intergraph and network has failed to show me how to do that so far.

pedroj commented 11 years ago

I see François. That's great. And yes, plotting bipartite structures yields a mess for very large graphs. One alternative is to use a matrix representation, i.e., a 'heatmap' of the adjacency matrix- this is useful for weighted incidence matrices. Even using the heat map on a version of the adjacency matrix where rows and cols have been rearranged according to a modularity algorithm could also highlight some patterns. This would yield a fattened view of the plot above. You lose the 'distance' sense that the FM ordination yields but you gain clarity in visualizing where the strong and weak links occur (in a weighted matrix). Another way is to use plot.web in the bipartite package. I find this useful as a first snapshot and as a useful way to represent multipartite structures.

pedroj commented 11 years ago

BTW, the use of ggnet function is as I suggested, plotting the two modes in different colors (even with a different shape). That's great.

pedroj commented 11 years ago

One thing... when you say "to find a way to "flatten" larger graphs with multiple edges into an unweighted adjacency matrix", do you mean to 'binarize' the weighted matrix? That's seems simple. I use a modification of a function by Diego Vazquez:

quant2bin<-function(matr)
{
  cn=colnames(matr)
  rn=rownames(matr)
  ij<-which(matr!=0,arr.ind=T)
  matrb<-matrix(0,dim(matr)[1],dim(matr)[2])
  matrb[ij]<-1
  colnames(matrb)=cn
  rownames(matrb)=rn
  return(matrb)
}

You can set any threshold value in ij<-which(matr!=0,arr.ind=T).

pedroj commented 11 years ago

I forgot to mention that package NetPreProc can be useful to you for flattening: Network Pre-Processing and normalization. http://www.rdocumentation.org/packages/NetPreProc/functions/Sparsify.matrix-methods

briatte commented 11 years ago

Thanks a lot Pedro, lots of helpful ideas. This script generates the sparse matrix that I use to get the plot above:

> head(A)
6 x 1500 sparse Matrix of class "dgTMatrix"
   [[ suppressing 1500 column names ‘1’, ‘2’, ‘3’ ... ]]

Boyer        1 . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 . . . . . . . . . . . . . 1 .
Cherpion     1 . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . 1 .
Cinieri      1 . . 1 . . . . . . . . . 1 . . . 1 . . . . . . . . . 1 . . . . . . . . . . . . . 1 .
Fromion      1 . . . . . . . . . . . . . . . . 1 . 1 . . . . . . 1 1 1 . . . . . . . . . . . . 1 .
Lazaro       1 . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 . . . . . . . . . . . . . 1 .
Le Callennec 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . 1 .

So the matrix is fine as far as I can tell. Things are also okay after the matrix is converted to an undirected bipartite graph in this script (the multiple argument is actually FALSE by default but is typed for clarity):

> iA <- graph.incidence(A, mode = "all", multiple = FALSE)
> iA
IGRAPH UN-B 2082 22659 -- 
+ attr: type (v/x), name (v/c)

The issue arises when I convert that structure to a network object and then try to ask for its adjacency matrix, because network does not (yet?) support converting multiple edges to a sparser representation:

Error in as.matrix.network.adjacency(net) : 
  Multigraphs not currently supported in as.matrix.network.adjacency.  Exiting.

I have given a try to Sparsify.matrix, but the NetPreProc package has fatal dependency issues.

pedroj commented 11 years ago

I've added some code I had to plot bipartite networks. It is in my bipartite_plots repo. This is not with ggoplot2 actually, but there is a prototype for using ggplot2 with bipartite networks. I've added more explanations in the pages.

pedroj commented 11 years ago

Cher François,

have you tried just using write.graph from igraph to export the network to an external file (e.g., as an edgelist) and then re-read it in network, getting the adjacency matrix? It's a bit cumbersome and not elegant, but can be useful.

write.graph(graph, file, format=c("edgelist", "pajek", "ncol", "lgl", "graphml", "dimacs", "gml", "dot", "leda"), ...)

Previously I also had difficulties with as.matrix.network.adjacency.

All the best, -- Pedro

El 08/07/2013, a las 00:34, François notifications@github.com escribió:

Thanks a lot Pedro, lots of helpful ideas. This script generates the sparse matrix that I use to get the plot above:

head(A) 6 x 1500 sparse Matrix of class "dgTMatrix" [[ suppressing 1500 column names ‘1’, ‘2’, ‘3’ ... ]]

Boyer 1 . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 . . . . . . . . . . . . . 1 . Cherpion 1 . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . 1 . Cinieri 1 . . 1 . . . . . . . . . 1 . . . 1 . . . . . . . . . 1 . . . . . . . . . . . . . 1 . Fromion 1 . . . . . . . . . . . . . . . . 1 . 1 . . . . . . 1 1 1 . . . . . . . . . . . . 1 . Lazaro 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 . . . . . . . . . . . . . 1 . Le Callennec 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . 1 .

So the matrix is fine as far as I can tell. Things are also okay after the matrix is converted to an undirected bipartite graph in this script:

iA <- graph.incidence(A, mode = "all", multiple = FALSE) iA IGRAPH UN-B 2082 22659 --

  • attr: type (v/x), name (v/c)

The issue arises when I convert that structure to a network object and then try to ask for its adjacency matrix, because network does not (yet?) support converting multiple edges to a sparser representation:

Error in as.matrix.network.adjacency(net) : Multigraphs not currently supported in as.matrix.network.adjacency. Exiting.

I have given a try to Sparsify.matrix, but the NetPreProc package has fatal dependency issues.

— Reply to this email directly or view it on GitHub.

briatte commented 11 years ago

Hi Pedro,

Fantastic stuff! I believe you have succeeded at doing two different things that I failed at:

  1. Weight the edges. Moritz Marbach's original code and yours are both more advanced than mine on that point. This should be easy to add to ggnet.
  2. Work with bipartite/multiple graphs. I'll take a careful look at your code and igraph workaround. I have used igraph in a similar way in this script.

Give me a bit of time to parse your work and I should hopefully be back with improvements to ggnet based on your code.

pedroj commented 11 years ago

Excellent! Please feel free to use any material- of course. Let's keep in touch. I'm really very interested in developing analyses with bipartite networks.

À bientôt, -- Pedro

El 10/07/2013, a las 14:32, François notifications@github.com escribió:

Hi Pedro,

Fantastic stuff! I believe you have succeeded at doing two different things that I failed at:

• Weight the edges. Moritz Marbach's original code and yours are both more advanced than mine on that point. This should be easy to add to ggnet. • Work with bipartite/multiple graphs. I'll take a careful look at your code and igraph workaround. I have used igraph in a similar way in this script. Give me a bit of time to parse your work and I should hopefully be back with improvements to ggnet based on your code.

— Reply to this email directly or view it on GitHub.

briatte commented 11 years ago

I have started looking at your code. Here's a Gist that plugs your bip_init_network and vectorize functions into ggnet to plot the bipartite graph out of your NCH example:

pj replication

This plot did not even require to update the ggnet function and should work straight out of the box if you have the downloader, ggplot2, network etc. packages installed. I have checked whether this works with all other ggnet options: it does, except for subset.threshold (which is expected, and can surely be fixed with little effort).

Do you believe that this method should be integrated into ggnet, and if so, how? For example, the function could auto-detect bipartite networks and automatically do what the Gist above does, provided a bipartite argument to set the mode names. Not sure what to do about edge weights, it seems like a different game to me.

I'll get back to you if I manage to make your method work with my neta examples of amendment co-sponsorship networks. It's a much, much larger network, and it uses a sparse matrix to store the adjacencies, so that will also be an interesting test to run.

Update: it works, but only on small samples. Here's a demo with 5,500 nodes:

am bipartite 5500

My full two-mode network has 500+ Members of Parliament (actors) and 11,500+ amendments (events). I have managed to plot up to 7,500 amendments, but network does not deal very well with large networks and creates a memory leak that crashes R if you try creating a network out of larger samples.

Incidentally, the last issue of Social Networks is on two-mode social networks.

pedroj commented 11 years ago

Thanks François! Yes, I think the bip coding can be fully implemented in ggnet. However, I don't think it would be useful that the function detects the bipartite structure. Any user can input a bipartite network but have the goal of analyzing the one-mode projections. That option can be useful in packages like bipartite where you obviously have objectives of analyzing bipartite structures- in this circumstance having an automatic detection can help.

It is very interesting that network hangs when dealing with large bipartite networks. Maybe you can open an issue with the authors of the package, to keep them informed.

Big thanks for pointing out the special issue of Social Networks! I think bipartite networks are under-studied in various aspects. But social scientists are certainly ahead in the study and analysis of this type of network.

briatte commented 11 years ago

Thanks for that last piece of feedback. I have performed further tests with network and have not yet managed to reproduce the issue, but I'll keep an eye at it.

So far, your vectorize function works perfectly with ggnet, and I agree that there is no need to make things more complicated at that stage. The function is good to go as such.

I'll work further on bipartite networks when I find the time to – thanks for a very interesting conversation over the code and the challenges of network analysis!

briatte commented 10 years ago

Hi Pedro,

I'm working again on ggnet while updating this repo, and have started digging your code again.

Regarding bipartite plots, this little snippet creates a mode vector for a network object. It should be easy enough to add a bipartite argument to ggnet that would accept TRUE or k to detect the mode or set it manually, or FALSE or 0 to ignore the bipartite attribute if present. Let me know if you have other ideas.

g = network::network.initialize(15, bipartite = 5)    #Initialize the network
network::is.bipartite(g)

if (g$gal$bipartite > 0) {
    node.group = rep("Event", network.size(g))
    node.group[ 1:g$gal$bipartite ] = "Actor"
}

node.group # mode vector
pedroj commented 10 years ago

Looks great François! I'll have a look in detail asap... If the bipartite graph is built (input) from an adjacency matrix, then the sizes of the two modes correspond to the number of rows and columns. If it is input as an edge list it should also be straightforward to get the two modes.

briatte commented 10 years ago

The code I posted has a glitch: I cannot get either network or igraph to plot the edges, because I don't seem to have created them with network.initialize. I'm not sure how to get a random bipartite network from network to implement the idea further (I'd like to test it on a 'natural' network object).

pedroj commented 10 years ago

You might want to have a try with function swap.web in package bipartite. i've used these functions too:

# Random matrix
randommat <- function (m, n, prob=c(0.92, 0.08)) {
    size=m*n
    matrix(sample(c(0,1), size, replace=T, prob), 
           nrow=m, ncol=n, byrow = TRUE)
}

# Fully nested matrix
fullynest <-  function (m, n) {
   x<- as.vector(rep(1, n))
    for (i in 1:m) {
        j=1
         x <- rbind(x,c(rep(1, n-j), rep(0, j)))}
   x<-data.frame(x,row.names = NULL)
#   x[-(m+1), ]
}
briatte commented 10 years ago

Thanks for the functions. I just pushed the cleaned up version of the code, with better centrality measures for one-mode networks but nothing yet for two-mode graphs. I'm dealing with very large networks, so it takes a bit of time to test everything out. The code won't even run on 32-bit R because it creates a vector that exceeds 2.4GB of memory…