Common interface for graphs with metadata

gdalle commented 3 years ago

Hey!

There has been some talk recently about adding better support for graph metadata, so here's an issue to centralize our discussions. First of all, here are the packages that I'm aware of that deal with vertex- & edge-level attributes:

One idea would be to design a common interface extending AbstractGraph to work with metadata. I'm aware that the four packages above are very different, and that the case of edge weights probably deserves special treatment. On the other hand, agreeing on a common set of names for functions would bring clear benefits:

User-friendliness would increase significantly, as well as ease of documentation
Switching between meta-graph implementations would be instantaneous. This is sometimes necessary to improve performance
Compatibility with downstream packages such as https://github.com/JuliaGraphs/GraphDataFrameBridge.jl would be a non-issue
The same goes with upstream packages such as https://github.com/JuliaGraphs/GraphIO.jl or https://github.com/JuliaGraphs/SNAPDatasets.jl

As food for thought, here are a few past discussions on this topic:

What is your take on this?

CameronBieganek commented 3 years ago

Thanks for opening this! I'll definitely be chiming in with some thoughts, but I might not have time until this weekend.

I actually have my own metadata graph package that I developed. But it is pretty minimal---I developed it for my own personal use.

https://github.com/CameronBieganek/PropertyGraphs.jl

bramtayl commented 3 years ago

It would certainly be convenient to be able to pass vertex labels instead of vertex indices wherever possible. The issues that I can think of though are

1) there is a performance cost to translating indices to labels 2) if vertex labels are Ints, then dispatch won't be able to tell the difference between vertex labels and indices, so we couldn't support both 3) we might have to add a whole bunch of extra trivial methods that translate labels to indices first

Maybe we could wrap labels in a type, like

Label(:a), so that dispatch could recognize and translate them when users want

CameronBieganek commented 3 years ago

Another point of reference is this package:

https://github.com/scheinerman/SimpleGraphs.jl

SimpleGraphs.jl lets you use any type for the vertices, so you can create your own custom vertex type and use it in a SimpleGraph.

Example:

using SimpleGraphs

struct Vertex{T}
    label::T
    weight::Float64
end

g = SimpleGraph{Vertex{String}}()

v1 = Vertex("a", 2.0)
v2 = Vertex("b", 3.0)

add!(g, v1)
add!(g, v2)
add!(g, v1, v2)

julia> g
SimpleGraph{Vertex{String}} (n=2, m=1)

julia> vlist(g)
2-element Vector{Vertex{String}}:
 Vertex{String}("a", 2.0)
 Vertex{String}("b", 3.0)

julia> elist(g)
1-element Vector{Tuple{Vertex{String}, Vertex{String}}}:
 (Vertex{String}("a", 2.0), Vertex{String}("b", 3.0))

I'll be back with more to say later---I just wanted to post this SimpleGraphs.jl example as food for thought.

CameronBieganek commented 3 years ago

Motivation

Suppose a developer wants to create a package with a metadata graph type with the following two properties:

The metadata graph can use any other AbstractGraph type for the underlying graph data structure.
The vertex labels are all of the same type T, but this type is not restricted to be an Integer. In other words, T can be any type.

The alternative to Property 1 is to create a custom graph data structure for the metadata graph. But this creates needless code duplication and restricts the user to only one type of underlying data structure for their metagraph.

Property 1 appears to be a popular choice among developers of metadata graph packages. It is adopted in MetaGraphsNext.jl, SimpleValueGraphs.jl, and PropertyGraphs.jl. However, it runs into difficulties when the wrapped graph is a mutable graph, because the [add|rem]_[vertex|edge]! methods are not included in the AbstractGraph interface. What this means is that it is impossible to keep the metadata in sync with the underlying graph data structure, unless you assume that you are wrapping a specific graph type (e.g. SimpleGraph), which contradicts the assumption in Property 1 that any AbstractGraph can be wrapped.

In my opinion, Property 2 is a very desirable feature for AbstractGraphs in general, and it is adopted for the specific metadata graph types implemented in MetaGraphsNext.jl and PropertyGraphs.jl. However, Property 2 also runs into difficulties. The definition of AbstractGraph is the following:

abstract type AbstractGraph{T} end

So, the type T is not restricted to be an Integer. However, there are core functions and algorithms in Graphs.jl, such as degree, all_neighbors, bfs_tree, dijkstra_shortest_paths, and a_star that only allow vertices to be specified as Integers. So any metadata graph package that aspires to Property 2 must provide functions to translate back and forth between vertex labels and the underlying integer vertex codes. This is a clumsy solution that leads to less readable code.

Proposals for an `AbstractGraph` interface in Graphs.jl version 2.0

I propose that the following changes be made to the existing AbstractGraphs interface:

Declare that the vertex label type T is unrestricted.
Add [add|rem]_[vertex|edge]! as a mandatory part of the interface for mutable graphs.
rem_vertex! guarantees that vertex labels will not be changed.
- (This is currently violated for Simple[Di]Graph.)

The signature for add_vertex! would change to the following:

add_vertex!(g, v)   # where g is a graph and v is a vertex label

Comments

Maybe this is only somewhat related, but, since this proposal would break the SimpleGraph type, it might be a good place to comment that I think concrete graph types whose names describe the data structure they use would make more sense, e.g. AdjacencyListGraph, DenseAdjacencyMatrixGraph, or SparseAdjacencyMatrixGraph.

gdalle commented 3 years ago

I made a small comparison of the implementation choices of various packages, hopefully this fuels the discussion:

SWG: SimpleWeightedGraphs.jl
MG: MetaGraphs.jl
MGN: MetaGraphsNext.jl
SVG: SimpleValueGraphs.jl

SG: SimpleGraphs.jl

	SWG	MG	MGN	SVG	SG
Graph storage	Weighted adjacency matrix	`SimpleGraph{T}`	`SimpleGraph{T}`	Adjacency list	Adjacency set
Vertex label type `VL`	-	`Any`	Not `<:Integer`	-	-
Vertex data type `VD`	-	`Dict{Symbol,Any}`	`Any`	`NamedTuple`	`Any`
Edge data type `ED`	`Real`	`Dict{Symbol,Any}`	`Any`	`NamedTuple`	-
Vertex label storage	-	`Dict{Any,T}`	`Dict{T,VL}`	-	-
Vertex data storage	-	`Dict{T,VD}`	`Dict{VL,VD}`	`Tuple{Vector}`	`Set{VD}`
Edge data storage	`SparseMatrixCSC{ED}`	`Dict{SimpleEdge{T},ED}`	`Dict{Tuple{VL,VL},ED}`	`Tuple{Vector{Vector}}`	-
Provides weights	Yes	Yes	Yes	No	No
Compatible with Graphs.jl	Yes	Yes	Yes	Yes	No

simonschoelly commented 3 years ago

Better support for metadata should definitely be a long term goal.

About creating a subtype of AbstractGraph:
- I am already doing some experiments with some abstract interface in SVG - this one would probably not work though for graphs with heterogeneous metadata (e.g. not each edge has the same metadata fields). This also shows how difficult it is to create some abstract interface from the beginning.
- One could also consider experimenting with some traits (maybe in a separate package) that would allow one to handle metadata. We could definitely take some inspirations from packages such as Tables.jl and AbstractTrees.jl there.
About vertex labels:
- I think it is not a good idea to replace the int vertex identifiers by labels by default, this would disallow a lot of optimizations.
- Vertex labels should therefore be a separate property, different from identifiers.
- It might be desirable to have multiple types of vertex labels, and also different ways to look them up (e.g. a dict, btree,...)
- One could also imagine a wrapper type here, that wraps a graph and adds labels to the vertices.
- I like the idea of using something like Label(:a) for specifying vertex labels, this way we won't have to change the interface.
- Vertex labels should not be the default for every graph, as it is always a tradeoff when one uses them.
On changing the interface for mutable graphs:
- There are definitely some things that can be done here, so that functions such as transitive_closure also work with other types of graphs.
- One could easily imagine a graph, where edges can be added/removed, but vertices are fixed, so one should be careful here when saying that all off add/remove should be supported for mutable graphs.
- The fact that rem_edge! can reorder edge indices has some advantages though, as it allows one to always be sure that the vertex identifiers are in the range 1:nv(g) which makes analytics faster.

@gdalle SVG supports the weights function - but if a graph has more than one edge value, one needs to specify which value should be used.

CameronBieganek commented 3 years ago

The fact that rem_edge! can reorder edge indices has some advantages though, as it allows one to always be sure that the vertex identifiers are in the range 1:nv(g) which makes analytics faster.

If rem_vertex! is allowed to change vertex labels, then I don't see how we could provide a generic interface for rem_vertex!. Any ideas?

I think it is not a good idea to replace the int vertex identifiers by labels by default, this would disallow a lot of optimizations.

There are many libraries in other languages, including retworkx, that allow non-integer vertex labels, and somehow they still manage to be fast. I think Julia is a sufficiently flexible language that we can implement performant graph types with arbitrary vertex labels.

CameronBieganek commented 3 years ago

Here's another argument for guaranteeing that rem_vertex! does not change vertex labels:

I think it makes sense to have the API for AbstractGraph conform as closely as possible to the mathematical definition of a graph. In other words, a graph is a set of vertices and a set of edges. Of course the implementation under the hood does not have to use sets, but I think that's the interface that we should present to the users.

With that in mind, I think this is a sensible definition for the interface for rem_vertex!:

After rem_vertex!(g, v) is called on a graph with vertices V, the new vertex set is V\{v}.

So if the vertex set is {1, 2, 3} and you remove vertex 2, then the new vertex set is {1, 3}.

EDIT:

This might just be fanciful thinking on my part, but it would be kind of cool to have an interface like this:

# Add a vertex:
push!(vertices(g), v)

# Add an edge:
push!(edges(g), e)

# Remove a vertex:
delete!(vertices(g), v)

# Remove an edge:
delete!(edges(g), e)

As long as vertices and edges return lazy iterators (which they currently do), these push! and delete! methods could be implemented efficiently.

etiennedeg commented 2 years ago

One of the main points of a graphs with MetaData is to be able to label vertices with something different than a range of consecutive integers. The problem with such labeling is that multiple algorithms require indexing on vertices. The major workaround is to have an efficient function that maps integers from 1:nv(g) to vertices(g) (lets call it map - it can be easily computed by calling collect(vertices(g))[i], with collect(vertices(g)) stored beforehand), and an efficient function mapping vertices to 1:nv(g)(lets call it reverse_map). This one is more problematic, but can easily be built in linear time with a dictionary. I see three ways to deal with this :

Functions using vertex indexing should first compute the map and reverse_map functions. Inconvenients :
- each call to such functions will allocate new map and reverse_map
- this need some rewrite in a lot of the functions of the codebase.
  1. map and reverse_map must be provided by the graph. This eliminate the allocations on each function call. These functions could also be lazily implemented (example, we could do mutations efficiently (with constant complexity), and on the next call of reverse_map, compute the map in linear time and cache it. Inconvenients :
- this need some rewrite in a lot of the functions of the codebase.
  1. Keep the existing requirement that vertices should belong to the range 1:nv(g) and let the graph type deal with it's own labeling outside (a bit like what MetaGraph is doing) Inconvenients :
- It's not possible to design a graph type with constant mutation complexity.

Personally, I'm tempted by the second option, even though this will require a lot a rewrite work.

JuliaGraphs / Graphs.jl