briatte / ggnetwork

Geoms to plot networks with ggplot2
https://briatte.github.io/ggnetwork/
146 stars 27 forks source link

`fortify.igraph` messes with vertex/edge attribute types #54

Closed briatte closed 5 years ago

briatte commented 5 years ago

See reprex at end of issue.

This is because fortify.network uses dumb for loops to import vertex/edge attributes (show vertex attributes only below):

https://github.com/briatte/ggnetwork/blob/97e6b56a33328bef30f4c2369858a27f86da9e57/R/fortify-network.R#L143-L147

This preserves attributes (column) types. By contrast, the (smarter) code in fortify.igraph uses sapply, which erroneously converts all vertex/edge attributes to character if a single attribute is of that class, resulting in numeric attributes being converted to characters (and then to factors, unless stringsAsFactors has been set to FALSE: see #53):

https://github.com/briatte/ggnetwork/blob/97e6b56a33328bef30f4c2369858a27f86da9e57/R/fortify-igraph.R#L55-L67

Solution (1) would be to use purrr::map_dfc to get the sapply method to preserve column types, but that would end up being more complex than a for at the cost of additional dependencies (including dplyr, since purrr::map_dfc relies on it).

Solution (2), which is dumber but better in this context, is to use for loops in fortify.igraph, as in fortify.network.

Solution (3), the best one in my view, is to use igraph::as_data_frame(x, what = "vertices") and igraph::as_data_frame(x, what = "edges") to import vertex and edge attributes, which is what I'll do, unless @jcfisher has a better fix.

library(ggnetwork)
#> Loading required package: ggplot2
library(network)
#> network: Classes for Relational Data
#> Version 1.15 created on 2019-04-01.
#> copyright (c) 2005, Carter T. Butts, University of California-Irvine
#>                     Mark S. Handcock, University of California -- Los Angeles
#>                     David R. Hunter, Penn State University
#>                     Martina Morris, University of Washington
#>                     Skye Bender-deMoll, University of Washington
#>  For citation information, type citation("network").
#>  Type help("network-package") to get started.
library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:network':
#> 
#>     %c%, %s%, add.edges, add.vertices, delete.edges,
#>     delete.vertices, get.edge.attribute, get.edges,
#>     get.vertex.attribute, is.bipartite, is.directed,
#>     list.edge.attributes, list.vertex.attributes,
#>     set.edge.attribute, set.vertex.attribute
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
library(intergraph)

# network with numeric and character edges
data(emon, package = "network")
emon[[1]]
#>  Network attributes:
#>   vertices = 14 
#>   directed = TRUE 
#>   hyper = FALSE 
#>   loops = FALSE 
#>   multiple = FALSE 
#>   total edges= 83 
#>     missing edges= 0 
#>     non-missing edges= 83 
#> 
#>  Vertex attribute names: 
#>     Command.Rank.Score Decision.Rank.Score Formalization Location Paid.Staff Sponsorship vertex.names Volunteer.Staff 
#> 
#>  Edge attribute names: 
#>     Frequency
intergraph::asIgraph(emon[[1]])
#> IGRAPH 031e6f4 D--- 14 83 -- 
#> + attr: Command.Rank.Score (v/n), Decision.Rank.Score (v/n),
#> | Formalization (v/n), Location (v/c), na (v/l), Paid.Staff (v/n),
#> | Sponsorship (v/c), vertex.names (v/c), Volunteer.Staff (v/n),
#> | Frequency (e/n), na (e/l)
#> + edges from 031e6f4:
#>  [1]  2->1  3->1  8->1  9->1 14->1  1->2  3->2  4->2  8->2  1->3  2->3
#> [12]  4->3  7->3 12->3 13->3  1->4  3->4  8->4  1->5  3->5  8->5 14->5
#> [23]  3->6  8->6  9->6  1->7  2->7  3->7  4->7  5->7  8->7  9->7 10->7
#> [34] 11->7 12->7 13->7  1->8  2->8  3->8  5->8  7->8  9->8 12->8 13->8
#> [45] 14->8  1->9  2->9  3->9  4->9  8->9 10->9 11->9 12->9 13->9
#> + ... omitted several edges

# all goes well with fortify.network, characters as factors (as per #53)
str(ggnetwork(emon[[1]]))
#> 'data.frame':    97 obs. of  15 variables:
#>  $ x                  : num  0.134 0 0.628 0.111 0.561 ...
#>  $ y                  : num  0.68 0.485 0.447 0.246 1 ...
#>  $ Command.Rank.Score : num  0 10 3 5 0 0 20 40 10 30 ...
#>  $ Decision.Rank.Score: num  20 7 0 5 0 0 20 50 10 20 ...
#>  $ Formalization      : num  2 1 1 1 1 1 1 2 1 3 ...
#>  $ Location           : Factor w/ 1 level "L": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ na.x               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  $ Paid.Staff         : num  10 400 200 60 1 7 60 7 70 100 ...
#>  $ Sponsorship        : Factor w/ 6 levels "City","County",..: 6 6 6 4 5 5 2 3 1 1 ...
#>  $ vertex.names       : Factor w/ 14 levels "A.1.Ambulance.Service",..: 12 14 13 4 9 10 8 5 2 3 ...
#>  $ Volunteer.Staff    : num  50 2000 0 0 20 80 20 100 0 0 ...
#>  $ xend               : num  0.134 0 0.628 0.111 0.561 ...
#>  $ yend               : num  0.68 0.485 0.447 0.246 1 ...
#>  $ Frequency          : num  NA NA NA NA NA NA NA NA NA NA ...
#>  $ na.y               : logi  NA NA NA NA NA NA ...

# however, fortify.igraph messes with the (vertex) attribute types
str(ggnetwork(intergraph::asIgraph(emon[[1]])))
#> 'data.frame':    97 obs. of  15 variables:
#>  $ x                  : num  0.662 0.724 0.392 0.537 0.424 ...
#>  $ y                  : num  0.414 0.69 0.664 1 0 ...
#>  $ Command.Rank.Score : Factor w/ 8 levels "0","10","2","20",..: 1 2 5 8 1 1 4 7 2 6 ...
#>  $ Decision.Rank.Score: Factor w/ 8 levels "0","10","2","20",..: 4 8 1 6 1 1 4 7 2 4 ...
#>  $ Formalization      : Factor w/ 3 levels "1","2","3": 2 1 1 1 1 1 1 2 1 3 ...
#>  $ Location           : Factor w/ 1 level "L": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ na.x               : Factor w/ 1 level "FALSE": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ Paid.Staff         : Factor w/ 9 levels "0","1","10","100",..: 3 6 5 7 2 8 7 8 9 4 ...
#>  $ Sponsorship        : Factor w/ 6 levels "City","County",..: 6 6 6 4 5 5 2 3 1 1 ...
#>  $ vertex.names       : Factor w/ 14 levels "A.1.Ambulance.Service",..: 12 14 13 4 9 10 8 5 2 3 ...
#>  $ Volunteer.Staff    : Factor w/ 7 levels "0","100","20",..: 5 4 1 1 3 7 3 2 1 1 ...
#>  $ xend               : num  0.662 0.724 0.392 0.537 0.424 ...
#>  $ yend               : num  0.414 0.69 0.664 1 0 ...
#>  $ Frequency          : num  NA NA NA NA NA NA NA NA NA NA ...
#>  $ na.y               : num  NA NA NA NA NA NA NA NA NA NA ...

Created on 2019-07-30 by the reprex package (v0.3.0)