JuliaGraphs / Graphs.jl

An optimized graphs package for the Julia programming language
http://juliagraphs.org/Graphs.jl/
Other
455 stars 88 forks source link

Common interface for graphs with metadata #35

Open gdalle opened 2 years ago

gdalle commented 2 years ago

Hey!

There has been some talk recently about adding better support for graph metadata, so here's an issue to centralize our discussions. First of all, here are the packages that I'm aware of that deal with vertex- & edge-level attributes:

One idea would be to design a common interface extending AbstractGraph to work with metadata. I'm aware that the four packages above are very different, and that the case of edge weights probably deserves special treatment. On the other hand, agreeing on a common set of names for functions would bring clear benefits:

As food for thought, here are a few past discussions on this topic:

What is your take on this?

CameronBieganek commented 2 years ago

Thanks for opening this! I'll definitely be chiming in with some thoughts, but I might not have time until this weekend.

I actually have my own metadata graph package that I developed. But it is pretty minimal---I developed it for my own personal use.

https://github.com/CameronBieganek/PropertyGraphs.jl

bramtayl commented 2 years ago

It would certainly be convenient to be able to pass vertex labels instead of vertex indices wherever possible. The issues that I can think of though are

1) there is a performance cost to translating indices to labels 2) if vertex labels are Ints, then dispatch won't be able to tell the difference between vertex labels and indices, so we couldn't support both 3) we might have to add a whole bunch of extra trivial methods that translate labels to indices first

Maybe we could wrap labels in a type, like

Label(:a), so that dispatch could recognize and translate them when users want

CameronBieganek commented 2 years ago

Another point of reference is this package:

https://github.com/scheinerman/SimpleGraphs.jl

SimpleGraphs.jl lets you use any type for the vertices, so you can create your own custom vertex type and use it in a SimpleGraph.

Example:

using SimpleGraphs

struct Vertex{T}
    label::T
    weight::Float64
end

g = SimpleGraph{Vertex{String}}()

v1 = Vertex("a", 2.0)
v2 = Vertex("b", 3.0)

add!(g, v1)
add!(g, v2)
add!(g, v1, v2)
julia> g
SimpleGraph{Vertex{String}} (n=2, m=1)

julia> vlist(g)
2-element Vector{Vertex{String}}:
 Vertex{String}("a", 2.0)
 Vertex{String}("b", 3.0)

julia> elist(g)
1-element Vector{Tuple{Vertex{String}, Vertex{String}}}:
 (Vertex{String}("a", 2.0), Vertex{String}("b", 3.0))

I'll be back with more to say later---I just wanted to post this SimpleGraphs.jl example as food for thought.

CameronBieganek commented 2 years ago

Motivation

Suppose a developer wants to create a package with a metadata graph type with the following two properties:

  1. The metadata graph can use any other AbstractGraph type for the underlying graph data structure.
  2. The vertex labels are all of the same type T, but this type is not restricted to be an Integer. In other words, T can be any type.

The alternative to Property 1 is to create a custom graph data structure for the metadata graph. But this creates needless code duplication and restricts the user to only one type of underlying data structure for their metagraph.

Property 1 appears to be a popular choice among developers of metadata graph packages. It is adopted in MetaGraphsNext.jl, SimpleValueGraphs.jl, and PropertyGraphs.jl. However, it runs into difficulties when the wrapped graph is a mutable graph, because the [add|rem]_[vertex|edge]! methods are not included in the AbstractGraph interface. What this means is that it is impossible to keep the metadata in sync with the underlying graph data structure, unless you assume that you are wrapping a specific graph type (e.g. SimpleGraph), which contradicts the assumption in Property 1 that any AbstractGraph can be wrapped.

In my opinion, Property 2 is a very desirable feature for AbstractGraphs in general, and it is adopted for the specific metadata graph types implemented in MetaGraphsNext.jl and PropertyGraphs.jl. However, Property 2 also runs into difficulties. The definition of AbstractGraph is the following:

abstract type AbstractGraph{T} end

So, the type T is not restricted to be an Integer. However, there are core functions and algorithms in Graphs.jl, such as degree, all_neighbors, bfs_tree, dijkstra_shortest_paths, and a_star that only allow vertices to be specified as Integers. So any metadata graph package that aspires to Property 2 must provide functions to translate back and forth between vertex labels and the underlying integer vertex codes. This is a clumsy solution that leads to less readable code.

Proposals for an AbstractGraph interface in Graphs.jl version 2.0

I propose that the following changes be made to the existing AbstractGraphs interface:

The signature for add_vertex! would change to the following:

add_vertex!(g, v)   # where g is a graph and v is a vertex label

Comments

Maybe this is only somewhat related, but, since this proposal would break the SimpleGraph type, it might be a good place to comment that I think concrete graph types whose names describe the data structure they use would make more sense, e.g. AdjacencyListGraph, DenseAdjacencyMatrixGraph, or SparseAdjacencyMatrixGraph.

gdalle commented 2 years ago

I made a small comparison of the implementation choices of various packages, hopefully this fuels the discussion:

simonschoelly commented 2 years ago

Better support for metadata should definitely be a long term goal.

@gdalle SVG supports the weights function - but if a graph has more than one edge value, one needs to specify which value should be used.

CameronBieganek commented 2 years ago

The fact that rem_edge! can reorder edge indices has some advantages though, as it allows one to always be sure that the vertex identifiers are in the range 1:nv(g) which makes analytics faster.

If rem_vertex! is allowed to change vertex labels, then I don't see how we could provide a generic interface for rem_vertex!. Any ideas?

I think it is not a good idea to replace the int vertex identifiers by labels by default, this would disallow a lot of optimizations.

There are many libraries in other languages, including retworkx, that allow non-integer vertex labels, and somehow they still manage to be fast. I think Julia is a sufficiently flexible language that we can implement performant graph types with arbitrary vertex labels.

CameronBieganek commented 2 years ago

Here's another argument for guaranteeing that rem_vertex! does not change vertex labels:

I think it makes sense to have the API for AbstractGraph conform as closely as possible to the mathematical definition of a graph. In other words, a graph is a set of vertices and a set of edges. Of course the implementation under the hood does not have to use sets, but I think that's the interface that we should present to the users.

With that in mind, I think this is a sensible definition for the interface for rem_vertex!:

So if the vertex set is {1, 2, 3} and you remove vertex 2, then the new vertex set is {1, 3}.

EDIT:

This might just be fanciful thinking on my part, but it would be kind of cool to have an interface like this:

# Add a vertex:
push!(vertices(g), v)

# Add an edge:
push!(edges(g), e)

# Remove a vertex:
delete!(vertices(g), v)

# Remove an edge:
delete!(edges(g), e)

As long as vertices and edges return lazy iterators (which they currently do), these push! and delete! methods could be implemented efficiently.

etiennedeg commented 2 years ago

One of the main points of a graphs with MetaData is to be able to label vertices with something different than a range of consecutive integers. The problem with such labeling is that multiple algorithms require indexing on vertices. The major workaround is to have an efficient function that maps integers from 1:nv(g) to vertices(g) (lets call it map - it can be easily computed by calling collect(vertices(g))[i], with collect(vertices(g)) stored beforehand), and an efficient function mapping vertices to 1:nv(g)(lets call it reverse_map). This one is more problematic, but can easily be built in linear time with a dictionary. I see three ways to deal with this :

  1. Functions using vertex indexing should first compute the map and reverse_map functions. Inconvenients :
    • each call to such functions will allocate new map and reverse_map
    • this need some rewrite in a lot of the functions of the codebase.
      1. map and reverse_map must be provided by the graph. This eliminate the allocations on each function call. These functions could also be lazily implemented (example, we could do mutations efficiently (with constant complexity), and on the next call of reverse_map, compute the map in linear time and cache it. Inconvenients :
    • this need some rewrite in a lot of the functions of the codebase.
      1. Keep the existing requirement that vertices should belong to the range 1:nv(g) and let the graph type deal with it's own labeling outside (a bit like what MetaGraph is doing) Inconvenients :
    • It's not possible to design a graph type with constant mutation complexity.

Personally, I'm tempted by the second option, even though this will require a lot a rewrite work.