Open gdalle opened 3 years ago
Thanks for opening this! I'll definitely be chiming in with some thoughts, but I might not have time until this weekend.
I actually have my own metadata graph package that I developed. But it is pretty minimal---I developed it for my own personal use.
It would certainly be convenient to be able to pass vertex labels instead of vertex indices wherever possible. The issues that I can think of though are
1) there is a performance cost to translating indices to labels
2) if vertex labels are Int
s, then dispatch won't be able to tell the difference between vertex labels and indices, so we couldn't support both
3) we might have to add a whole bunch of extra trivial methods that translate labels to indices first
Maybe we could wrap labels in a type, like
Label(:a)
, so that dispatch could recognize and translate them when users want
Another point of reference is this package:
https://github.com/scheinerman/SimpleGraphs.jl
SimpleGraphs.jl lets you use any type for the vertices, so you can create your own custom vertex type and use it in a SimpleGraph
.
Example:
using SimpleGraphs
struct Vertex{T}
label::T
weight::Float64
end
g = SimpleGraph{Vertex{String}}()
v1 = Vertex("a", 2.0)
v2 = Vertex("b", 3.0)
add!(g, v1)
add!(g, v2)
add!(g, v1, v2)
julia> g
SimpleGraph{Vertex{String}} (n=2, m=1)
julia> vlist(g)
2-element Vector{Vertex{String}}:
Vertex{String}("a", 2.0)
Vertex{String}("b", 3.0)
julia> elist(g)
1-element Vector{Tuple{Vertex{String}, Vertex{String}}}:
(Vertex{String}("a", 2.0), Vertex{String}("b", 3.0))
I'll be back with more to say later---I just wanted to post this SimpleGraphs.jl example as food for thought.
Suppose a developer wants to create a package with a metadata graph type with the following two properties:
AbstractGraph
type for the underlying graph data structure.T
, but this type is not restricted to be an Integer
. In
other words, T
can be any type.The alternative to Property 1 is to create a custom graph data structure for the metadata graph. But this creates needless code duplication and restricts the user to only one type of underlying data structure for their metagraph.
Property 1 appears to be a popular choice among developers of metadata graph packages. It is adopted in MetaGraphsNext.jl, SimpleValueGraphs.jl, and PropertyGraphs.jl. However, it runs into difficulties when the wrapped graph is a mutable graph, because the [add|rem]_[vertex|edge]!
methods are not included in the AbstractGraph
interface. What this means is that it is impossible to keep the metadata in sync with the underlying graph data structure, unless you assume that you are wrapping a specific graph type (e.g. SimpleGraph
), which contradicts the assumption in Property 1 that any AbstractGraph
can be wrapped.
In my opinion, Property 2 is a very desirable feature for AbstractGraph
s in general, and it is adopted for the specific metadata graph types implemented in MetaGraphsNext.jl and PropertyGraphs.jl. However, Property 2 also runs into difficulties. The definition of AbstractGraph
is the following:
abstract type AbstractGraph{T} end
So, the type T
is not restricted to be an Integer
. However, there are core functions and algorithms in Graphs.jl, such as degree
, all_neighbors
, bfs_tree
, dijkstra_shortest_paths
, and a_star
that only allow vertices to be specified as Integer
s. So any metadata graph package that aspires to Property 2 must provide functions to translate back and forth between vertex labels and the underlying integer vertex codes. This is a clumsy solution that leads to less readable code.
AbstractGraph
interface in Graphs.jl version 2.0I propose that the following changes be made to the existing AbstractGraphs
interface:
T
is unrestricted.[add|rem]_[vertex|edge]!
as a mandatory part of the interface for mutable graphs.rem_vertex!
guarantees that vertex labels will not be changed.
Simple[Di]Graph
.)The signature for add_vertex!
would change to the following:
add_vertex!(g, v) # where g is a graph and v is a vertex label
Maybe this is only somewhat related, but, since this proposal would break the SimpleGraph
type, it might be a good place to comment that I think concrete graph types whose names describe the data structure they use would make more sense, e.g. AdjacencyListGraph
, DenseAdjacencyMatrixGraph
, or SparseAdjacencyMatrixGraph
.
I made a small comparison of the implementation choices of various packages, hopefully this fuels the discussion:
SG: SimpleGraphs.jl
SWG | MG | MGN | SVG | SG | |
---|---|---|---|---|---|
Graph storage | Weighted adjacency matrix | SimpleGraph{T} |
SimpleGraph{T} |
Adjacency list | Adjacency set |
Vertex label type VL |
- | Any |
Not <:Integer |
- | - |
Vertex data type VD |
- | Dict{Symbol,Any} |
Any |
NamedTuple |
Any |
Edge data type ED |
Real |
Dict{Symbol,Any} |
Any |
NamedTuple |
- |
Vertex label storage | - | Dict{Any,T} |
Dict{T,VL} |
- | - |
Vertex data storage | - | Dict{T,VD} |
Dict{VL,VD} |
Tuple{Vector} |
Set{VD} |
Edge data storage | SparseMatrixCSC{ED} |
Dict{SimpleEdge{T},ED} |
Dict{Tuple{VL,VL},ED} |
Tuple{Vector{Vector}} |
- |
Provides weights | Yes | Yes | Yes | No | No |
Compatible with Graphs.jl | Yes | Yes | Yes | Yes | No |
Better support for metadata should definitely be a long term goal.
About creating a subtype of AbstractGraph
:
About vertex labels:
Label(:a)
for specifying vertex labels, this way we won't have to change the interface.On changing the interface for mutable graphs:
transitive_closure
also work with other types of graphs.rem_edge!
can reorder edge indices has some advantages though, as it allows one to always be sure that the vertex identifiers are in the range 1:nv(g)
which makes analytics faster.@gdalle SVG supports the weights
function - but if a graph has more than one edge value, one needs to specify which value should be used.
The fact that
rem_edge!
can reorder edge indices has some advantages though, as it allows one to always be sure that the vertex identifiers are in the range1:nv(g)
which makes analytics faster.
If rem_vertex!
is allowed to change vertex labels, then I don't see how we could provide a generic interface for rem_vertex!
. Any ideas?
I think it is not a good idea to replace the int vertex identifiers by labels by default, this would disallow a lot of optimizations.
There are many libraries in other languages, including retworkx, that allow non-integer vertex labels, and somehow they still manage to be fast. I think Julia is a sufficiently flexible language that we can implement performant graph types with arbitrary vertex labels.
Here's another argument for guaranteeing that rem_vertex!
does not change vertex labels:
I think it makes sense to have the API for AbstractGraph
conform as closely as possible to the mathematical definition of a graph. In other words, a graph is a set of vertices and a set of edges. Of course the implementation under the hood does not have to use sets, but I think that's the interface that we should present to the users.
With that in mind, I think this is a sensible definition for the interface for rem_vertex!
:
rem_vertex!(g, v)
is called on a graph with vertices V
, the new vertex set is V\{v}
.So if the vertex set is {1, 2, 3}
and you remove vertex 2
, then the new vertex set is {1, 3}
.
EDIT:
This might just be fanciful thinking on my part, but it would be kind of cool to have an interface like this:
# Add a vertex:
push!(vertices(g), v)
# Add an edge:
push!(edges(g), e)
# Remove a vertex:
delete!(vertices(g), v)
# Remove an edge:
delete!(edges(g), e)
As long as vertices
and edges
return lazy iterators (which they currently do), these push!
and delete!
methods could be implemented efficiently.
One of the main points of a graphs with MetaData is to be able to label vertices with something different than a range of consecutive integers. The problem with such labeling is that multiple algorithms require indexing on vertices. The major workaround is to have an efficient function that maps integers from 1:nv(g)
to vertices(g)
(lets call it map
- it can be easily computed by calling collect(vertices(g))[i]
, with collect(vertices(g))
stored beforehand), and an efficient function mapping vertices to 1:nv(g)
(lets call it reverse_map
). This one is more problematic, but can easily be built in linear time with a dictionary.
I see three ways to deal with this :
map
and reverse_map
functions.
Inconvenients :
map
and reverse_map
map
and reverse_map
must be provided by the graph. This eliminate the allocations on each function call. These functions could also be lazily implemented (example, we could do mutations efficiently (with constant complexity), and on the next call of reverse_map
, compute the map in linear time and cache it.
Inconvenients :1:nv(g)
and let the graph type deal with it's own labeling outside (a bit like what MetaGraph
is doing)
Inconvenients : Personally, I'm tempted by the second option, even though this will require a lot a rewrite work.
Hey!
There has been some talk recently about adding better support for graph metadata, so here's an issue to centralize our discussions. First of all, here are the packages that I'm aware of that deal with vertex- & edge-level attributes:
One idea would be to design a common interface extending
AbstractGraph
to work with metadata. I'm aware that the four packages above are very different, and that the case of edge weights probably deserves special treatment. On the other hand, agreeing on a common set of names for functions would bring clear benefits:As food for thought, here are a few past discussions on this topic:
What is your take on this?