christophergandrud / networkD3

D3 JavaScript Network Graphs from R
http://christophergandrud.github.io/networkD3
649 stars 270 forks source link

`check_zero()` causes `sankeyNetwork()` and `forceNetwork()` to fail silently on specific inputs #281

Open lewinfox opened 3 years ago

lewinfox commented 3 years ago

Hi, thanks for an awesome package!

I've found an issue where assumptions made by check_zero() cause some graph plotting operations to fail silently if you have a 1-indexed rather than 0-indexed dataset for Links . I have a fix ready and will raise a PR for review in a sec.

To reproduce

It's a bit involved so bear with me. I initially found this when using sankeyNetwork() but it also affects forceNetwork().

First we pretend we have some 1-indexed link data:

URL <- paste0('https://cdn.rawgit.com/christophergandrud/networkD3/',
              'master/JSONdata/energy.json')

energy <- jsonlite::fromJSON(URL)

# Simulate 1-indexed data
bad_energy <- energy
bad_energy$links$source <- energy$links$source + 1
bad_energy$links$source <- energy$links$source + 1

Normally we get a warning for this

sankeyNetwork(
    Links = bad_energy$links, 
    Nodes = bad_energy$nodes,
    Source = 'source',
    Target = 'target',
    Value = 'value'
)
#> Warning message:
#> It looks like Source/Target is not zero-indexed. This is required in JavaScript and so your plot may not render. 

However, if we pass in a tbl_df we get no warning and no plot. We do see a message about data frame conversion but nothing about the 1-indexing error, and no plot is rendered.

sankeyNetwork(
    Links = tibble::as_tibble(bad_energy$links), 
    Nodes = bad_energy$nodes,
    Source = 'source',
    Target = 'target',
    Value = 'value'
)
#> Links is a tbl_df. Converting to a plain data frame.

This is caused by the way Links is subset in the call to check_zero():

# R/sankeyNetwork.R:82
check_zero(Links[, Source], Links[, Target])

The problem arises when Links is a tbl_df because [.tbl_df returns a tbl_df even if only one column is selected, unlike [.data.frame. This breaks check_zero() because it assumes its inputs are vectors.

We can get the same effect by omitting Source or Target - when these are missing the subsetting operation returns the entire data frame, which again breaks check_zero().

# Both of these calls produce no plot and no warning

sankeyNetwork(
    Links = bad_energy$links, 
    Nodes = bad_energy$nodes,
    # Source = 'source',
    Target = 'target',
    Value = 'value'
)

sankeyNetwork(
    Links = bad_energy$links, 
    Nodes = bad_energy$nodes,
    Source = 'source',
    # Target = 'target',
    Value = 'value'
)

Fix

I've created a PR to fix this (basically just move check_zero() after input checking and tbl_df_strip()) which I will link in a sec.

Hope this is helpful, thanks again!