Discussion : API for Graphs.jl 2.0

JuliaGraphs / Graphs.jl

An optimized graphs package for the Julia programming language

http://juliagraphs.org/Graphs.jl/

Other

453 stars 88 forks source link

Discussion : API for Graphs.jl 2.0 #146

Open etiennedeg opened 2 years ago

etiennedeg commented 2 years ago

This is for discussing the future API of Graphs.jl

Here is a first shot to open the discussions: In the light of Why I no longer recommend Julia, I tried to make the API as complete as possible, and to detail all the assumptions that go with it. I took a lot of inspiration from boost graph, and try to make it the less breaking possible, there may be many breaking changes in my proposition that can be avoided simply. I also left many question marks and void in it.

Vertices

Vertex (Trait)

required :

isless(v1::Vertex , v2::Vertex)::Bool assumptions:
- isless forms a total order
hash(v::Vertex)::UInt

IntegerVertex < : Vertex

IntegerVertex will be associated with containers indexed by a UnitRange required :

getindex(V::IntegerVertex)::Uint (or vindex?)

Edges

AbstractEdge{T<:Vertex} (Trait ? See #132)

required :

src(e::AbstractEdge)::T
dst(e::AbstractEdge)::T assumptions:
- no guarantee that src(e) <= dst(e) ?
isless(e1::AbstractEdge, e2::AbstractEdge)::Bool assumptions:
- isless forms a total order
hash(e::AbstractEdge)::UInt

WeightedEdge{T} < : AbstractEdge{T}

weight(e::WeightedEdge)::Float

DataEdge

todo

Graphs

AbstractGraph{V<:Vertex, E<:AbstractEdge}:

required:

vertices(g::AbstractGraph)::{Iterator over Vertex}
get_edges(g::AbstractGraph, u::Vertex, v::Vertex)::{Iterator over AbstractEdge}
edges(g::AbstractGraph)::{Iterator over AbstractEdge} must be coherent with get_edges
outedges(g::AbstractGraph, v::Vertex)::{Iterator over AbstractEdge} assumptions for each iterator output:
- no repetition (of vertices or edges)
- no need to be sorted
get_weight(g::AbstractGraph, v::Vertex) default to 1 ? Or only for WeightedGraphs ?
get_weight(g::AbstractGraph, e::AbstractEdge) default to 1 if edge is resent else 0 ? or only for WeightedGraphs ?

not required:

nv(g) = length(vertices(g))
ne(g) = length(edges(g))
outneighbors(g::AbstractGraph, v::Vertex) = union([src(e) == v ? dst(e) : src(e) for e in outedges(g, v)])
edges(g, v) = outedges(g, v)
outdegree(g, v) = length(outneighbors(g, v))
outdegree(g, u) = length(outedges(g, u))
has_edge(g, e)::Bool
has_vertex(g, v)::Bool
has_self_loops(g::AbstractGraph) = any(src(e) == dst(e) for e in edges(g))
get_vertex_container(g::AbstractGraph{V, E}, K::Type)::{Vertex container over type} = Dict{V, K}() The container is indexed by vertices of g
get_edge_container(g::AbstractGraph{V, E}, K::Type)::{Vertex container} = Dict{E, K}() The container is indexed by edges of g

BidirectionalGraph <: Graph

required:

inneighbors, inedges assumptions:
- no repetition of vertices
- no need to be sorted

not required:

all_neighbors = union(inneigbors, outneighbors)
neighbors = all_neighbors
indegree = length(inneighbors)
degree = length(all_neighbors)

IsSimple <: AbstractGraph

If true, additional requirement that no two edges can have same src and dst Self-loops allowed because of implementation details and history of Graphs.jl (but at most one self-loop per vertex)

weights(g:AbstractGraph)::AbstractMatrix ? not required:
get_edge(g, u, v) = get_edges(g, u, v)

IsDirected <: AbstractGraph

(must be implemented ? allow mixed graphs ?) If false:

it is assumed that an each edge is an out_edge of both dest and src. edges output edges only for one direction. (so length(edges) = ne(g))
it must be a subtype of Bidirectional graph ?
it is assumed inedges = outedges

RangeBasedGraph <: AbstractGraph

it is assumed vertices(g::RangeBasedGraph) = 1:nv(g)

adjacency_matrix(g::AbstractGraph)::AbstractMatrix ? A[i,j] should have positive value (or true value?) only if there is an edge between i and j
weights(g::AbstractGraph) ?

Mutability

(distinguish vertex and edge mutability ?)

IsSimplyMutable

graph structure able to represent the structure of any unweighted simple graph required:

GraphType()::{GraphType<:IsSimplyMutable} returns an empty graph.
add_vertex!(g::AbstractGraph)::Vertex return created vertex should generally succede (graph is sufficiently general to handle any number of vertices), can fail only due to overflow. (or allow more general failure, with clean exception thrown ?)
add_edge!(g::AbstractGraph, e::AbstractEdge)::Bool allowed to return false if an edge with same extremities already exist If the edge does not already exist, it should generally succede.
rem_vertex!(g, v) should always succede adjacent edges are deleted ? (or undefined behavior as in boost, use in cojonction with clear vertex?) vertex and edge identifiers can no longer be valid
rem_edge!(g, e) should always succede edge identifiers can no longer be valid, but vertices will be

not required:

add_vertices!(g, l) = add_vertex!(g, e) for e in edges(g)
add_edges!, rem_vertices!, rem_edges!
GraphType(g::AbstractGraph)::{GraphType<:IsSimplyMutable} cast a graph g to an instance of GraphType
zero(GraphType<:IsSimplyMutable) = GraphType() ?
GraphType(n::Integer)::{GraphType<:IsSimplyMutable} returns a graph with n vertices and no edges.

IsMutable <: IsSimplyMutable

graph structure able to represent the structure of any unweighted multigraph

add_edge should generally succede

IsWeightMutable

graph structure able to modify all it's weight (but not necessarily able to change its structure)

set_weight!(g::AbstractGraph, e::WeightedEdge{T, U}, w::U)

IsVertexStable

If mutated, vertex identifiers are guaranteed to still be valid (we can compare vertices gathered before and after a mutation, a vertex container gathered before a mutation will still be valid after)

IsEdgeStable

If mutated, edge identifiers are guaranteed to still be valid (we can compare edges gathered before and after a mutation, an edge container gathered before a mutation will still be valid after)

gdalle commented 2 years ago

Hi @etiennedeg, and thanks for this first draft! Here are my thoughts:

Why Vertex and not AbstractVertex?
Why allow for DataEdge and not, say, DataVertex?
What is the idea behind get_vertex/edge_container?
I would keep a fallback get_weight for general graphs, and I think it should throw an error on non-existent edges instead of returning zero (because edges with zero weights can exist too)
I'm not sure I understand the difference between BidirectionalGraph and graphs with the IsDirected trait
Should adjacency_matrix only be defined for RangeBasedGraph? Or should there be a natural enumeration of vertices for other types of graphs?
I don't get the nuance between IsSimplyMutable and IsMutable

Also, what should we use to implement traits? SimpleTraits.jl?

etiennedeg commented 2 years ago

* Why `Vertex` and not `AbstractVertex`?
I don't remember, AbstractVertex is probably better.

Why allow for DataEdge and not, say, DataVertex? Of course, I just put some examples, but if DataEdge is part of the API, DataVertex should also be

What is the idea behind get_vertex/edge_container? For some algorithm, like astar, we need to have a list indexed by vertices (here to store the heuristic distance at each vertex). If the vertices does not form a UnitRange, we can't use a list and we can require a Dict. get_vertex_contrainer should return the best suited container depending on the vertex type and the assumptions on the graph. For example, a grid graph could have vertices reprsented by CartesianCoordinates, and the container could be a matrix.

I'm not sure I understand the difference between BidirectionalGraph and graphs with the IsDirected trait BiDirectional comes from the boost graph library. For some graph implementations it is much more costly to call inneighbors than outneighbors, so it is good to have implementations that avoid inneighbors. Bidirectional guarantee that we call inneighbors (and probably assumes it is a bit efficient). Bidirectional is trivial for undirected graphs, so that why I wondered if we can force undirected graphs to be subtype of Bidirectional

Should adjacency_matrix only be defined for RangeBasedGraph? Or should there be a natural enumeration of vertices for other types of graphs? The point of allowing more general vertex type is to avoid maintaining a range based enumeration of vertices. If a graph is implemented in a way that it can provide a natural enumeration of vertices, then he should probably subtype RangeBasedGraph.

I don't get the nuance between IsSimplyMutable and IsMutable IsSimplyMutable is a type of graph that can represent the structure of any simple graph. For example, SimpleGraph is a simply mutable graph. IsMutable is a type of graph that can represent the structure of any graph (including multigraphs). SimpleGraph is not a mutable graph because it can't be used to represent a graph with multiple edges.

Also, what should we use to implement traits? SimpleTraits.jl?

This is what is used for the moment, are there better alternatives?

etiennedeg commented 2 years ago

Makes me remember we should also define the API for the vertex / edge containers

jirka-fink commented 2 years ago

Thanks for considering another API. Especially for weighted graphs need them. Here are my thoughts:

Should functions vertices and edges be required? There are algorithms like DFS, Dijkstra, A* and various state exploration methods (e.g. alpha-beta pruning) for which relevant parts of a graph are generated on fly. These algorithms can be used on infinite graphs, like various lattices, or huge graphs, like state space of chess.

weight(e::WeightedEdge)::Float Does it mean that a weight of an edge must be a float? I would prefer to have the option to have graphs with integer weights.

RangeBasedGraph <: AbstractGraph

it is assumed vertices(g::RangeBasedGraph) = 1:nv(g)
adjacency_matrix(g::AbstractGraph)::AbstractMatrix ?
A[i,j] should have positive value (or true value?) only if there is an edge between i and j
weights(g::AbstractGraph) ?

Having vertices identified by the range 1:nv(g) simplifies a lot of algorithms. But why providing an adjacency_matrix should necessary? Why weight of an edge cannot be zero? What the function weight should be doing? I would suggest RangeBasedGraph be a trait which only imposes vertices to be 1:nv(g).

I am not sure if IsDirected <: AbstractGraph means abstract type and their inheritance or traits.

etiennedeg commented 2 years ago

Thank you for your feedback on the API.

Should functions vertices and edges be required? There are algorithms like DFS, Dijkstra, A* and various state exploration methods (e.g. alpha-beta pruning) for which relevant parts of a graph are generated on fly. These algorithms can be used on infinite graphs, like various lattices, or huge graphs, like state space of chess.

edges and vertices are allowed to return iterator so this is not much a problem. Another point where we need to be careful is the container, but we can use a dictionary to this effect. But it makes me realize that in this case, its dangerous to collect the output of vertices or edges. So maybe it is worth creating a Trait for infinite graphs. weight(e::WeightedEdge)::Float Does it mean that a weight of an edge must be a float? I would prefer to have the option to have graphs with integer weights.

I don't know why I didn't used the current SimpleWeightEdge{T<:Integer, U<:Real}, which would become SimpleWeightEdge{T<:Vertex, U<:Real}, and weight(e) would return U.

Having vertices identified by the range 1:nv(g) simplifies a lot of algorithms. But why providing an adjacency_matrix should necessary?

The adjacency matrix can be easily built by querying the neighbors for every vertices of the graph, so we can make a default implementation. Graph types that want a customized implementation for better performance can do so, but it having it add API does not add more burden on implementation, it just guarantees that such a function can be called. Why weight of an edge cannot be zero? What the function weight should be doing?

I don't see where I said this. Currently, SimpleWeightedGraph does not support zero weights, but this is due to the implementation, and that could eventually be lifted by this PR. In the API, I was wondering if get_edge must be implemented only for weighted graphs, or if we could define default weights for generic graphs (In which case we probably not need an abstract type or trait for weighted graphs) In the API I gave, weight(e) is only defined for AbstractWeightedEdge (but if we go the way of defauting weight for generic graphs, we must define it for AbstractEdge too. There is also weights(g) that return the weight matrix, get_weight(g, v) and get_weight(g, v). weight and weights are already part of the API, but we can still find a better name instead of get_weight if needed. For weights, there is still the question of which type must implement it (This is why I put question marks). As for the adjacency matrix, we probably need a RangeBasedGraph, but I think we also need the graph to be simple. Maybe we should not have this function in the API.

I am not sure if IsDirected <: AbstractGraph means abstract type and their inheritance or traits.

Here, IsDirected is a trait (as it already is), but you are right that I didn't precised what is a type and what is a trait. I think only AbstractGraph and BidirectionalGraph should be types.

jlapeyre commented 1 year ago

Also, what should we use to implement traits? SimpleTraits.jl?

SimpleTraits can seriously degrade the usefulness of stack traces, and @which. Maybe there is a way around this that I don't know about. I don't have another solution to offer except to use by-hand, non-macro traits.

gdalle commented 1 year ago

Hey @etiennedeg,

Thanks a lot for your work! Here are a few more remarks.

I really think this experiment belongs in a GraphsBase.jl package containing only the interface and a few basic implementations. It will make it easier for everyone to understand the progress, without it being cluttered by all the algorithms from Graphs.jl. See https://github.com/JuliaGraphs/Graphs.jl/issues/135

IntegerVertex will be associated with containers indexed by a UnitRange

Do we really need this concept of graphs indexed by a UnitRange anymore? If we rewrite all the algorithms to accept generic vertices, the only important thing is to have the right vertex_container, right?

weight(e::WeightedEdge)::Float

The weight of an edge shouldn't even be constrained to Real, you can do shortest path computations on elements of an arbitrary ordered monoid

DataEdge

Is this actually useful? I expect every user would implement their own DataEdge as a subtype of AbstractEdge anyway?

get_weight(g::AbstractGraph, v::Vertex) default to 1 ? Or only for WeightedGraphs ?

Why would vertices have a weight?

get_vertex_container(g::AbstractGraph{V, E}, K::Type)::{Vertex container over type} = Dict{V, K}()

What is K here?

weights(g:AbstractGraph)::AbstractMatrix ?

You put it for a subtype, why would this not be available for general AbstractGraphs? Also, in the rework of the algorithms, it would definitely be interesting to get rid of the need to allocate this matrix beforehand.

(must be implemented ? allow mixed graphs ?)

I think we can safely exclude mixed graphs, IIRC they induce some pretty nasty things algorithmically.

it must be a subtype of Bidirectional graph ?

What additional information does IsDirected give?

adjacency_matrix(g::AbstractGraph)::AbstractMatrix ?

If the vertices are comparable there is always a canonical ordering, so we can always define an adjacency matrix (albeit with an expensive sort step)

(distinguish vertex and edge mutability ?)

I think this is only useful if we want very strict tests that all trait interfaces are satisfied by a concrete implementation. AFAIK the way to perform such tests is not yet agreed upon, so maybe this is an unneeded layer of complexity

add_vertex!(g::AbstractGraph)::Vertex return created vertex

I would return the true or false that we have returned sofar, or the graph object itself to be coherent with things like push!

adjacent edges are deleted ? (or undefined behavior as in boost, use in cojonction with clear vertex?)

I think deletion of adjacent edges is reasonable

graph structure able to modify all it's weight (but not necessarily able to change its structure)

In your proposal, there is no distinction between the vertex / the edge and its metadata. I kinda like it, but this raises the question: what happens when a user just wants to change the vertex or edge metadata (more generic than just the weight). Right now the only solution I see is to remove and reinsert

IsVertexStable / IsEdgeStable

Can't we just make it a requirement everywhere? This is one of the most common requests anyway, people are pissed about vertex deletion shuffling everything

gdalle commented 1 year ago

Makes me remember we should also define the API for the vertex / edge containers

It would be nice to also have a dict-like behavior, as in MetaGraphs(Next)

gdalle commented 1 year ago

There are algorithms like DFS, Dijkstra, A* and various state exploration methods (e.g. alpha-beta pruning) for which relevant parts of a graph are generated on fly. These algorithms can be used on infinite graphs, like various lattices, or huge graphs, like state space of chess.

Being able to handle infinite graphs out of the box would be a very cool asset

gdalle commented 1 year ago

SimpleTraits can seriously degrade the usefulness of stack traces, and @which

@jlapeyre can you give an example for that? Do you know if it also holds true for other traits implementations?

gdalle commented 1 year ago

These two issues were very detailed and relevant: https://github.com/JuliaGraphs/Graphs.jl/issues/35 and https://github.com/JuliaGraphs/Graphs.jl/issues/165

gdalle commented 1 year ago

While we're at it, we should probably be a bit more careful with the word Simple. A graph can be "simple" if it has no duplicated edges, but I think we should abandon the types SimpleEdge and SimpleGraph in favor of more explicit names like UnweightedEdge and AdjacencyListGraph

gdalle commented 1 year ago

Finally, a harder question: can we support hypergraphs, where "edges" are arbitrary subsets of the vertex set? I think not, and maybe that's okay

etiennedeg commented 1 year ago

IntegerVertex will be associated with containers indexed by a UnitRange

Do we really need this concept of graphs indexed by a UnitRange anymore? If we rewrite all the algorithms to accept generic vertices, the only important thing is to have the right vertex_container, right?

This is a good point. The question is if the use of a vector indexed by vertices was the only use of the assumption of vertices forming a UnitRange. I'm inclined to think that we indeed do not require this assumption anymore, but that would need a check of the codebase to verify that there is indeed no other optimization coming from this. I see multiple occurrences of :nv using github search tool, but after a quick glance, nearly all can be directly replaced by vertices(g) without needing the assumption of getting a range.

weight(e::WeightedEdge)::Float

The weight of an edge shouldn't even be constrained to Real, you can do shortest path computations on elements of an arbitrary ordered monoid

I agree

DataEdge

Is this actually useful? I expect every user would implement their own DataEdge as a subtype of AbstractEdge anyway?

This allows to provide a common interface to access metadata, like what was discussed here. I have no strong opinion on how to design metadata for graphs, and don't exactly know how it would look like so feel free to share how you would see an API for metadata.

get_weight(g::AbstractGraph, v::Vertex) default to 1 ? Or only for WeightedGraphs ?

Why would vertices have a weight?

Some algorithms may require vertex weights but we currently do no support it and these must be passed as an argument if an algorithm needs it. Maybe we can let users who want integer weights deal with it with more general metadata, but we could also provide a common interface for it, I don't know. Again, I don't have a strong opinion on it.

get_vertex_container(g::AbstractGraph{V, E}, K::Type)::{Vertex container over type} = Dict{V, K}()

What is K here?

This is the eltype of the container. For SimpleGraphs, we will return Vector{K}(undef, nv(g)). For example, if we want a visited list of visitors, we will take K = Bool, if we want a 'predecessor' list, we will take K = V.

weights(g:AbstractGraph)::AbstractMatrix ?

You put it for a subtype, why would this not be available for general AbstractGraphs? Also, in the rework of the algorithms, it would definitely be interesting to get rid of the need to allocate this matrix beforehand.

adjacency_matrix(g::AbstractGraph)::AbstractMatrix ?

If the vertices are comparable there is always a canonical ordering, so we can always define an adjacency matrix (albeit with an expensive sort step)

For these two points, these appears under RangeBasedGraph so I get the confusion but it was two question marks on if these two methods should be added in the interface for AbstractGraph. We don't need a canonical ordering for an adjacency matrix, we just need to be able to index a Tuple of AbstractVertex. However, I just saw that to subtype AbstractArray, you need indexing by 'Int', so it is only available for IntegerGraph which subtype V<:Int.

it must be a subtype of Bidirectional graph ?

What additional information does IsDirected give?

See https://github.com/JuliaGraphs/Graphs.jl/issues/146#issuecomment-1160083499 and Boost BidirectionalGraph. If we add BidirectionalGraph in the interface, then we allows AbstractGraph to not implement inneighbors for performance purposes. Of course, the codebase should contains as few algorithms requiring inneighbors, of specialize if there is really a performance benefit.

add_vertex!(g::AbstractGraph)::Vertex return created vertex

I would return the true or false that we have returned sofar, or the graph object itself to be coherent with things like push!

Ok, but I think there should be a way to add a vertex and also get a reference to the vertex just added, because it is not so trivial to retrieve it.

graph structure able to modify all it's weight (but not necessarily able to change its structure)

In your proposal, there is no distinction between the vertex / the edge and its metadata. I kinda like it, but this raises the question: what happens when a user just wants to change the vertex or edge metadata (more generic than just the weight). Right now the only solution I see is to remove and reinsert

You can change the metadata of a vertex / edge without changing its reference. It will still be the same vertex / edge. You should not make the lexicographic order or the indexation depends on the metadata.

IsVertexStable / IsEdgeStable

Can't we just make it a requirement everywhere? This is one of the most common requests anyway, people are pissed about vertex deletion shuffling everything

SimpleGraph is not vertex stable, if we impose vertex stability, we can no longer mutate SimpleGraph

Makes me remember we should also define the API for the vertex / edge containers

It would be nice to also have a dict-like behavior, as in MetaGraphs(Next)

What do you mean ? By an API for vertex / edge containers, I mean clarifying the question on whether we can iterate these containers, if the iterations should satisfy the lexicographic ordering or so...

There are algorithms like DFS, Dijkstra, A* and various state exploration methods (e.g. alpha-beta pruning) for which relevant parts of a graph are generated on fly. These algorithms can be used on infinite graphs, like various lattices, or huge graphs, like state space of chess.

Being able to handle infinite graphs out of the box would be a very cool asset

I don't feel like this should be in the interface, and I don't know of a wide utility. Almost no algorithm of the codebase will satisfy this assumption, and I don't think I want to rewrite / maintain algorithms specialized on this assumption. I think that if someone want's to deal with infinite graphs, he can make a wrapper around a mutable graph that will allocate during the exploration of the graph, and he can write his own algorithms for it.

While we're at it, we should probably be a bit more careful with the word Simple. A graph can be "simple" if it has no duplicated edges, but I think we should abandon the types SimpleEdge and SimpleGraph in favor of more explicit names like UnweightedEdge and AdjacencyListGraph

I'm not sure renaming SimpleGraph is a good idea, given how widely it is used. Frankly, I'm fine with calling it SimpleGraph though it allows loops.

Finally, a harder question: can we support hypergraphs, where "edges" are arbitrary subsets of the vertex set? I think not, and maybe that's okay

Definitely not in Graphs.jl. We could maybe propose an API in the future GraphBase.jl, but if we propose algorithms, that definitely needs a dedicated package.

gdalle commented 1 year ago

Hey @etiennedeg, I was just wondering how things are going? Want me to set up a GraphsBase.jl package to highlight your new interface and start getting feedback?

gdalle commented 1 year ago

That would need a check of the codebase to verify that there is indeed no other optimization coming from this. I see multiple occurrences of :nv using github search tool, but after a quick glance, nearly all can be directly replaced by vertices(g) without needing the assumption of getting a range.

Another reason why I want to put this in a GraphsBase.jl package first. As things stand, the Graphs.jl codebase as a whole will be violently incompatible with the new interface, so it doesn't make much sense to me to have both in the same place. Especially since the compatibility restoration effort will be large.

I have no strong opinion on how to design metadata for graphs, and don't exactly know how it would look like so feel free to share how you would see an API for metadata.

The main hurdle I see with metadata is that if vertices and edges carry their data with them (instead of just being identifiers), algorithms might end up allocating a lot more memory than needed when storing them (think the priority queue of Dijkstra). To me, the arbitrary vertex and edge types are required, but then there should also be associated structures for the data, that we don't need to include when doing things like collect(vertices(g)). What do you think?

Some algorithms may require vertex weights but we currently do no support it and these must be passed as an argument if an algorithm needs it. Maybe we can let users who want integer weights deal with it with more general metadata, but we could also provide a common interface for it, I don't know. Again, I don't have a strong opinion on it.

I think weights should be handled as part of the metadata. Maybe we could have default edge and vertex metadata, with weights set to 0 and 1 respectively?

This is the eltype of the container. For SimpleGraphs, we will return Vector{K}(undef, nv(g)). For example, if we want a visited list of visitors, we will take K = Bool, if we want a 'predecessor' list, we will take K = V.

Not entirely sure I understand, but in any case this should probably be a ::Type{K} type selector.

We don't need a canonical ordering for an adjacency matrix, we just need to be able to index a Tuple of AbstractVertex. However, I just saw that to subtype AbstractArray, you need indexing by 'Int', so it is only available for IntegerGraph which subtype V<:Int.

As long as we have a comparison operation, we can define a default integer indexing, even if it involves some allocations.

Ok, but I think there should be a way to add a vertex and also get a reference to the vertex just added, because it is not so trivial to retrieve it.

In this new interface, isn't the vertex itself the reference?

You can change the metadata of a vertex / edge without changing its reference. It will still be the same vertex / edge. You should not make the lexicographic order or the indexation depends on the metadata.

OK this is good.

SimpleGraph is not vertex stable, if we impose vertex stability, we can no longer mutate SimpleGraph

So we're not changing the SimpleGraph format? We're keeping vertices(g) = 1:n for this one?

What do you mean ? By an API for vertex / edge containers, I mean clarifying the question on whether we can iterate these containers, if the iterations should satisfy the lexicographic ordering or so...

I agree. I was mostly thinking of access to a single vertex, like defining what g[v, :possibly_a_symbol] should return.

I don't feel like this should be in the interface, and I don't know of a wide utility. Almost no algorithm of the codebase will satisfy this assumption, and I don't think I want to rewrite / maintain algorithms specialized on this assumption. I think that if someone want's to deal with infinite graphs, he can make a wrapper around a mutable graph that will allocate during the exploration of the graph, and we can write his own algorithms for it.

Fair.

I'm not sure renaming SimpleGraph is a good idea, given how widely it is used. Frankly, I'm fine with calling it SimpleGraph though it allows loops.

Is that why we're not changing SimpleGraph vertex stability, because of the wide use? We're tagging a breaking release anyway, so I don't think it's much of a stretch to make it safer if we don't lose performance in the process?

Definitely not in Graphs.jl. We could maybe propose an API in the future GraphBase.jl, but if we propose algorithms, that definitely needs a dedicated package.

Agreed

etiennedeg commented 1 year ago

I just made a draft for the Graphs 2.0, but I will keep the discussion on API here.

Here are some thoughts about this first shot:

I Implemented the AbstractVertex Trait, but did not used it that much. Traits.jl is horrible when dealing with multiple Traits, of when Traits are not of the first argument, maybe we can just allow Vertex to be anything? The only difference (I think) will be that instead of Error:V is not a Vertex, we will have isless(V, V) is not defined (this is the only assumption for vertices).

I wonder how breaking it is to change AbstractGraph{T} to AbstractGraph{T, E} The only breaking thing I noticed is when defining a subtype of AbstractGraph.

I defaulted the Traits IsSimple, IsMutable to false to avoid breakage, I suppose this is fine ?

When calling get_vertex_container, some code patterns no longer works. For example, container[list_of_vertices] .= false is no longer usable because Dict does not support broadcast (TIL). We will need a strong test suite to make sure our code is robust. I will try to implement an EvilGraph of the new interface, in the light of https://github.com/JuliaGraphs/Graphs.jl/pull/133 and https://github.com/cmcaine/EvilArrays.jl

Also, we will probably need some more convenience optional interface methods for working with containers, for example a get_vertex_container_filled(g, value) for initializing the containers, and maybe some for iterating in some ways that are not supported by all containers.

I don't know how to design the get_edge_container. I feel like the right container should be an array for SimpleGraphs. Do we make AbstractEdge indexable for arrays ? That sound appealing, is there some pitfalls ? Or do we make specifically a wrapper around an array to get something indexable by an edge ?

We need to settle the behavior for weights. As I currently implemented it:

AbstractEdge{V, U} have an optional weight interface function, returning a weight of type U (there is no AbstractWeightedEdge with this design choice, even if I let it in the current draft)
The weight type of a graph is gathered by dispatch with ::AbstractGraph{V, E<:AbstractEdge{V, U}} with {V, E, U} An other option would be to have:
AbstractGraph{V, E, U}', and haveweight(g, e)` as an interface function.

The first one is not necessarily restrictive for WeightedGraphs, since the edges can be created on the fly, and weights of a graph can still have a global storage as a matrix. Unless you see a fundamental flaw with one of these models, I don't have a strong opinion on it.

I used the code pattern:

for e in outedges(g, current)
    neigbhor = src(e) == current ? dst(e) : src(e)
   # code
end

But it feels messy and suboptimal. Do we add an enumerate_outedges interface function for (e, neighbor) in enumerate_outedges(g, current) ?

The main hurdle I see with metadata is that if vertices and edges carry their data with them (instead of just being identifiers), algorithms might end up allocating a lot more memory than needed when storing them (think the priority queue of Dijkstra). To me, the arbitrary vertex and edge types are required, but then there should also be associated structures for the data, that we don't need to include when doing things like collect(vertices(g)).

For non primitive datatypes, collection only holds references, the data will be store always in only one place, so there is not more allocations.

I think weights should be handled as part of the metadata. Maybe we could have default edge and vertex metadata, with weights set to 0 and 1 respectively?

I think so for vertex weights, as these are not of much uses, and their default value is not very clear. In the current draft, I defined default weights for all edges (default weight of 1), but I'm not sure for the moment if it is the best solution.

Not entirely sure I understand, but in any case this should probably be a ::Type{K} type selector.

We can probably use a type selector here

We don't need a canonical ordering for an adjacency matrix, we just need to be able to index a Tuple of AbstractVertex. However, I just saw that to subtype AbstractArray, you need indexing by 'Int', so it is only available for IntegerGraph which subtype V<:Int.

As long as we have a comparison operation, we can define a default integer indexing, even if it involves some allocations.

There is no problem in returning an object that is indexed by two vertices, or even by an edge. However, if we want to get an adjacency matrix to do linear algebra with it, an AbstractArray is definitely needed. We can indeed return an adjacency matrix indexed by a default integer indexing, but I don't think that should belongs to the interface.

So we're not changing the SimpleGraph format? We're keeping vertices(g) = 1:n for this one? Is that why we're not changing SimpleGraph vertex stability, because of the wide use? We're tagging a breaking release anyway, so I don't think it's much of a stretch to make it safer if we don't lose performance in the process?

The 2.0 will definitely be breaking, but we should still limit at our best the breakage. Furthermore, the non-vertex stability is part of the performance of the library, there is a balance proposed to the user: fast graph with no vertex stability and slow mutation, or slower graph with vertex stability and fast mutation.

I agree. I was mostly thinking of access to a single vertex, like defining what g[v, :possibly_a_symbol] should return.

That should be discussed, this looks a bit like:

g[u] = adjacency list of neighbors of u
g[u][v] = attributes of edge uv
g[u][v]["weight"] = attribute "weight" of edge uv

of NetworkX. This is a totally different syntax than what we use now, I feel very mixed about it...

gdalle commented 1 year ago

Thank you so much for your work 🙏 Gonna take a look tonight or tomorrow! Any thoughts on putting this in a GraphsBase.jl?

gdalle commented 1 year ago

Traits.jl is horrible when dealing with multiple Traits, of when Traits are not of the first argument, maybe we can just allow Vertex to be anything?

Besides I seem to remember John Lapeyre saying they mess up the stack traces. And they make it harder for beginners to implement their own graph types. My preference would be to use an abstract type or nothing at all, and document expectations thoroughly instead.

I wonder how breaking it is to change AbstractGraph{T} to AbstractGraph{T, E}

Besides the subtyping I don't see another issue.

I defaulted the Traits IsSimple, IsMutable to false to avoid breakage, I suppose this is fine ?

Can we make it so that users don't have to define these traits themselves? Not sure which traits implementation you chose but there must be at least one where we can check the existence of methods. For instance, IsMutable could be as simple as hasmethod add_vertex! or hasmethod add_edge!

When calling get_vertex_container, some code patterns no longer works. For example, container[list_of_vertices] .= false is no longer usable because Dict does not support broadcast (TIL).

I still don't understand why we can't just put this stuff in a vector every time we need it? As long as vertices are comparable we can build a unit range as index.

I don't know how to design the get_edge_container. I feel like the right container should be an array for SimpleGraphs.

What shape of array? If we allow multiple edges, then even a matrix is not enough.

AbstractGraph{V, E, U}', and have weight(g, e)` as an interface function.

Why not AbstractGraph{V,E} with weight(g, e)? If weight is type-stable, then the containers for weights will be inferred without issue.

Do we make AbstractEdge indexable for arrays ?

Do you mean A[e] := A[src(e), dst(e)] or something like that?

neighbor = src(e) == current ? dst(e) : src(e)

Wait, if src(e) != current, how can e be an outedge?

In the current draft, I defined default weights for all edges (default weight of 1), but I'm not sure for the moment if it is the best solution.

There is a tension between storing edge weights as part of the vertex / edge object or separately. I think it is conceptually simpler if it is part of the object, and we have an interface weight(g, v) and weight(g, e) with default values 0 and 1. The only question is, what type should the 0 and 1 have, but I don't think it matters very much since Int64 are pretty chill when it comes to promotion.

We can indeed return an adjacency matrix indexed by a default integer indexing, but I don't think that should belongs to the interface.

Why not? It wouldn't take much to order vertices on the fly, it's an n log n operation.

Furthermore, the non-vertex stability is part of the performance of the library, there is a balance proposed to the user: fast graph with no vertex stability and slow mutation, or slower graph with vertex stability and fast mutation.

I thought you said this would probably not impact the performance? I'd be up to add some benchmarks once this is stabilized

This is a totally different syntax than what we use now, I feel very mixed about it...

OK, let's leave it aside for now and make everything explicit through function calls. That's a non-breaking functionality we can add later anyway

gdalle commented 1 year ago

I'll start reviewing the PR this week

etiennedeg commented 1 year ago

Traits.jl is horrible when dealing with multiple Traits, of when Traits are not of the first argument, maybe we can just allow Vertex to be anything?

Besides I seem to remember John Lapeyre saying they mess up the stack traces. And they make it harder for beginners to implement their own graph types. My preference would be to use an abstract type or nothing at all, and document expectations thoroughly instead.

I'm not sure I would be as categorical as you, I find the Trait to be suitable for IsDirected, and it also makes sense for the other Traits I proposed, like IsMutable or IsSimple, but I would be ok dropping the AbstractVertex Trait.

I defaulted the Traits IsSimple, IsMutable to false to avoid breakage, I suppose this is fine ?

Can we make it so that users don't have to define these traits themselves? Not sure which traits implementation you chose but there must be at least one where we can check the existence of methods. For instance, IsMutable could be as simple as hasmethod add_vertex! or hasmethod add_edge!

Sound like a great idea

When calling get_vertex_container, some code patterns no longer works. For example, container[list_of_vertices] .= false is no longer usable because Dict does not support broadcast (TIL).

I still don't understand why we can't just put this stuff in a vector every time we need it? As long as vertices are comparable we can build a unit range as index.

We can indeed return an adjacency matrix indexed by a default integer indexing, but I don't think that should belongs to the interface.

Why not? It wouldn't take much to order vertices on the fly, it's an n log n operation.

This is an nlog(n) operation, while many algorithm needing a container could be linear. Moreover, to store the mapping from vertices to indices, we will need a dictionary, so it will not provide any speed up (except if we have many container, so the same index gathered once can be used for all the containers, or if we heavily rely on iterating through the containers more than using random lookups). However, you are right that providing the mapping is conceptually simpler, because we don't have to make the code generic for any kind of container (and that can also lead to speedups). We have a tradeoff between get_vertex_container and a (maybe) slower but easier management with mapping to indices. I'm incline to ask more knowledgeable people for advice for the best design choice, as this one seems to be quite crucial.

I don't know how to design the get_edge_container. I feel like the right container should be an array for SimpleGraphs.

What shape of array? If we allow multiple edges, then even a matrix is not enough.

That's why I talk about SimpleGraph and not AbstractGraph

AbstractGraph{V, E, U}', and have weight(g, e)` as an interface function.

Why not AbstractGraph{V,E} with weight(g, e)? If weight is type-stable, then the containers for weights will be inferred without issue.

I don't really like to rely on the automatic inferrability of julia, especially since it can change between new releases. I also suspect there are some use case where we really need to gather the eltype.

Do we make AbstractEdge indexable for arrays ?

Do you mean A[e] := A[src(e), dst(e)] or something like that?

Yes

neighbor = src(e) == current ? dst(e) : src(e)

Wait, if src(e) != current, how can e be an outedge?

My understanding is that for an undirected graph, the order of src and dst does not matter, so if u-v is an edge, both Edge(u, v) and Edge(v, u) should be outedges of u.

In the current draft, I defined default weights for all edges (default weight of 1), but I'm not sure for the moment if it is the best solution.

There is a tension between storing edge weights as part of the vertex / edge object or separately. I think it is conceptually simpler if it is part of the object, and we have an interface weight(g, v) and weight(g, e) with default values 0 and 1.

Do we already have this as an interface ? I don't think so... As I said, I have no strong opinion, so I'm ok with weigh(g, e) The only question is, what type should the 0 and 1 have, but I don't think it matters very much since Int64 are pretty chill when it comes to promotion.

We can have a parametric type for edge weights for AbstractGraph. For SimpleGraph, this parametric type will not exist and will be an Int (which default to 1), as is currently returned when calling weights(g)

Furthermore, the non-vertex stability is part of the performance of the library, there is a balance proposed to the user: fast graph with no vertex stability and slow mutation, or slower graph with vertex stability and fast mutation.

I thought you said this would probably not impact the performance? I'd be up to add some benchmarks once this is stabilized

If the user keep a SimpleGraph, this will not affect the performance. If the user have a use case where he needs to be free of the vertex stability, then he will use a graph type offering type stability (and pay the associated cost). The user will only have more choice. I still think benchmarks will be necessary because the added genericity may cause unintended slowdowns.

gdalle commented 1 year ago

I'm not sure I would be as categorical as you, I find the Trait to be suitable for IsDirected, and it also makes sense for the other Traits I proposed, like IsMutable or IsSimple, but I would be ok dropping the AbstractVertex Trait.

Sounds good to me. Traits to indicate properties of a graph, but not to constrain individual vertex or edge objects.

Sound like a great idea

The easiest implementation of traits in my view is https://github.com/jolin-io/WhereTraits.jl, what do you think?

We have a tradeoff between get_vertex_container and a (maybe) slower but easier management with mapping to indices. I'm incline to ask more knowledgeable people for advice for the best design choice, as this one seems to be quite crucial.

I understand the dilemma, although I would personally err in favor of simplicity.

I don't really like to rely on the automatic inferrability of julia, especially since it can change between new releases. I also suspect there are some use case where we really need to gather the eltype.

That's fair. But in the spirit of symmetry one would be tempted to treat vertex weights and edge weights equally? Or do we just agree that edge weights are everywhere and vertex weights are not, so we put only the former in the type?

My understanding is that for an undirected graph, the order of src and dst does not matter, so if u-v is an edge, both Edge(u, v) and Edge(v, u) should be outedges of u.

I mean for an undirected graph even the terminology src and dst is a bit misleading. But I'm not sure an UndirectedEdge type would bring anything?

Do we already have this as an interface ? I don't think so... As I said, I have no strong opinion, so I'm ok with weigh(g, e)

In that case we need to give a little thought to the constructor, so that the type parameter associated to edge weights is automatically defined somehow?

gdalle commented 1 year ago

This discussion is becoming hard to follow even for us two. I think instead of a single PR it would be nice to get GraphsBase.jl started, and then debate these topics in plenty of small issues based on an existing first draft. What do you say @etiennedeg?

gdalle commented 1 year ago

OK I'll create the repo with all the CI stuff and then let you copy your code :)

gdalle commented 1 year ago

Et voilà: https://github.com/JuliaGraphs/GraphsBase.jl

I set it up with rather strict quality tests, if Aqua or JET cause you troubles let me know

henrik-wolf commented 1 year ago

Thank you for starting the effort @etiennedeg ! also, +1 for starting a new Package. (for what it's worth... 😄)

From reading this thread I am a bit confused about how we want to treat edges. But I will probably start that discussion once we have a more structured place for that. (Thanks @gdalle for doing the tedious job of setting up a package.)

Regarding traits, I want to add a few observations to the mix:

The easiest implementation of traits in my view is https://github.com/jolin-io/WhereTraits.jl, what do you think?

From what I understand, this feels like the easiest route, but it is very heavy in terms of dependencies... (In addition to that, it as well depends on Graphs.jl and MetaGraphs.jl which... seem like a problem?) (funnily, due to the Graphs.jl dependency, WhereTraits.jl currently depends indirectly on SimpleTraits.jl.)

How bad would it be to just not use any traits package, and just build them by hand? From what I understand, GeoInterface.jl seems to get quite far with this manual approach.

Although I like the name GraphsBase, I would like to do some bikeshedding about whether Graph(s)Interface might describe the package a bit better.

gdalle commented 1 year ago

From what I understand, this feels like the easiest route, but it is very heavy in terms of dependencies... (In addition to that, it as well depends on Graphs.jl and MetaGraphs.jl which... seem like a problem?) (funnily, due to the Graphs.jl dependency, WhereTraits.jl currently depends indirectly on SimpleTraits.jl.)

Holy cow that is a good point. Let's stick with SimpleTraits for now ^^

gdalle commented 1 year ago

Although I like the name GraphsBase, I would like to do some bikeshedding about whether Graph(s)Interface might describe the package a bit better.

Not just an interface, because it will also define the strucs for SimpleGraph and probably a more general MetaGraph