ecohealthalliance / yenpathy

Yen's K Shortest Paths in R, Quickly
Other
7 stars 3 forks source link

What kinds of graphs does it work with? #5

Open szhorvat opened 5 years ago

szhorvat commented 5 years ago

This is part of the JOSS review.

An important requirement for any package dealing with graphs is to be clear about what types of graphs it works with, and ensure that it behaves reasonably (return a non-surprising result or issue an error) for any type of graph that it is given.

My usual checklist:

Undirected vs directed

It appears that this package considers any input to represent a directed graph, but this is not documented anywhere. It is especially disconcerting to pass the function an undirected igraph object and see it handled as directed (with arbitrary edge direction assignments).

This must be corrected.

Graphs with multi-edges

Consider this graph:

image

How many paths from 1 to 3? One or two?

Again, this must be at least documented.

Graphs with self-loops

k_shortest_paths(g, 100, 100) returns the result 100 100 regardless of whether the vertex 100 is even present in the graph. When the vertex is present, it does not take into account whether the vertex has a self-loop.

Again, in the very least both behaviours (self-loop handling and handling of non-present vertices) should be documented.

I understand that a dataframe is unable to encode isolated vertices, which might be grounds for considering any vertex to be present. But this is surprising behaviour that should be clearly documented with examples.


Generally, if the function takes a certain datatype as input (e.g. an igraph graph) then it must reasonably handle all possible forms this datatype can take (directed/undirected, multigraphs, self-loops, empty graphs, weighted/unweighted, etc.)

toph-allen commented 5 years ago

@szhorvat Thanks for the detailed comment — I'm fairly new to working with graphs, and your explanation makes it all very clear.

The C++ code we use as the core of the package assumes that all graphs are directed, and the work we're using this for deals only with directed graphs, so those other cases you describe weren't on our radar.

I'll update the package's code and documentation to deal with those cases. Where it's possible to handle them seamlessly, I'll do that. If there are instances where handling a case would require a rewrite of the C++ code that's beyond what I'm able to do at this point, I'll explicitly check for that case and handle it as gracefully as possible.

For undirected graphs: The C++ code takes a data frame as its input, and assumes that each edge is directed. Would representing an undirected graph using a data frame of mutual edges cause any issues? If so, it's probably better to explicitly state that the current version of the package handles directed graphs only, even though as you state this is not ideal.

I'll reply with any other questions as they occur to me. Thanks again!

szhorvat commented 5 years ago

representing an undirected graph using a data frame of mutual edges

I think that's the correct solution here.

toph-allen commented 5 years ago

Thanks. I'll comment again here when I've made changes.