ecohealthalliance / yenpathy

Yen's K Shortest Paths in R, Quickly
Other
7 stars 3 forks source link

handling of factor columns #6

Open corybrunson opened 5 years ago

corybrunson commented 5 years ago

A frequent (in my experience) source of confusion for new R users is anticipating that data.frame() will convert character vectors into factor columns. This is not an issue for yenpathy per se, but it results in an error message that requires the user either to recognize the issue or to examine the source code:

library(yenpathy)
g <- data.frame(
  start = letters[c(1, 2, 4)],
  end = letters[c(2, 3, 5)]
)
k_shortest_paths(g, from = "a", to = "c")
#> Error in k_shortest_paths(g, from = "a", to = "c"): Please use character or integer vectors for your node columns
g <- data.frame(
  start = I(letters[c(1, 2, 4)]),
  end = I(letters[c(2, 3, 5)])
)
k_shortest_paths(g, from = "a", to = "c")
#> [[1]]
#> [1] "a" "b" "c"

Created on 2019-10-01 by the reprex package (v0.2.1)

It might be worth the extra bookkeeping to convert factors to integers and return paths as integer, character, or factor vectors (in the last case, with the same levels); otherwise, since the error is only produced when the node columns are factors, i would suggest a more specific message, e.g. "Some node columns are factors; please use only characters or integers."

Part of this JOSS review.

toph-allen commented 5 years ago

Yeah, I had opted not to handle factors because I thought that — because of the trickiness of factors' dual integer and character representation — it was less likely that they'd be used to represent a graph.

But to your point, I can imagine a scenario where someone would read in a graph from a CSV file with read.csv(), which would use factors by default if there were characters in the node names, and be unable to use yenpathy's functions.

I'll add some bookkeeping code to convert factors on the way in and out of the function.