gonum / gonum

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more
https://www.gonum.org/
BSD 3-Clause "New" or "Revised" License
7.53k stars 535 forks source link

graph/simple: Feature Request: Complete, cycle, wheel, and star-making functions #1582

Open mattConn opened 3 years ago

mattConn commented 3 years ago

Feature

A useful addition to the graph/simple package could be some classic graph-making functions. The graph-making functions I have in mind would initialize and return "complete", "cycle", "wheel" and "star" graphs.

Having these functions in the package would offer a built-in convenience. These graphs are also useful as building blocks for other graphs. Lastly, for a new user of Gonum and/or student of Graph Theory, these functions could make for a good starting point.

Implementation

Each of these functions would consist of setting edges via loop and return a concrete, mutable graph. For example, a complete graph-making function might look like:

func MakeUndirectedCompleteGraph(start, length int) *simple.UndirectedGraph {
    g := simple.NewUndirectedGraph()

    for i := start; i < start+length; i++ {
        for j := i + 1; j < start+length; j++ {
            g.SetEdge(g.NewEdge(simple.Node(i), simple.Node(j)))
        }
    }

    return g
}

This would yield a graph with nodes [start, start+length), so that the programmer can make and join similar graphs without collision. The code would be similar for cycle, wheel, and star functions, each having identical type signatures.

I would like to contribute to this feature should it be deemed useful.

kortschak commented 3 years ago

I would prefer if this were closer to the API in graph/graphs/gen and did not return a concrete type but rather took a destination graph.Builder. The example above would look like this:

func Complete(dst graph.Builder, n int) {
    _, isDirected := dst.(graph.Directed)
    for i := 0; i < n; i++ {
        for j := i + 1; j < n; j++ {
            u := g.NewNode()
            g.AddNode(u)
            v := g.NewNode()
            g.AddNode(v)
            g.SetEdge(g.NewEdge(u, v))
            if isDirected {
                g.SetEdge(g.NewEdge(v, u))
            }
        }
    }
}

The documentation needs to say that the graph being constructed will be a subgraph of the destination if there are already nodes/edges present.

This decouples the graph construction from graph/simple. The signatures for the different functions would look different, for example, star and wheel may take a parameter specifying the central node (or return the id of the central node if there is no parameter).

I would also suggest that "path(n)", "tree(d,b)" and "randomtree(n)" (in some flavour) may be useful to add.

mattConn commented 3 years ago

The documentation needs to say that the graph being constructed will be a subgraph of the destination if there are already nodes/edges present.

That sounds good.

Decoupling these functions from simple via Builder implementation sounds like a good idea. However, a Builder does not allow specification of a node ID, which would make any kind of join or union impossible between graphs made with these functions due to node collision. A workaround could be initializing the source graph with a starter node with a chosen ID before passing to these functions.

I like central node arguments for star and wheel. I also like path, tree, and randomtree functions joining the bunch.

kortschak commented 3 years ago

However, a Builder does not allow specification of a node ID, which would make any kind of join or union impossible between graphs made with these functions due to node collision.

OK. So a problem statement is probably worth having here. The "What are you trying to do?" in the feature request template gets at this.

mattConn commented 3 years ago

Should I amend my initial comment to address this? If I may elaborate here in the meantime: I am interested in controlling the node ID's of the graphs made by these functions. For example, Complete(n) may look like Complete(m,n), which would make a graph of n nodes, with ID's ranging from m to n-1. Also, I did mention this concern/interest in my initial comment.

kortschak commented 3 years ago

I'm wondering why the node ranges need to be specified. This is what the "What are you trying to do?" gets at (to avoid the XY problem).

mattConn commented 3 years ago

I see what you mean. I'll keep that in mind for future issues/feature requests.

With a specified node range, multiple graphs may be made with the same structure without them being equal by nodes and edges. Without specified node range, this may be achieved outside of the function, however, I considered it an added convenience to allow the function to handle this itself, as well as a way to signal to the programmer that these are the values that will populate their graph, and that this is the way for them to control this.

kortschak commented 3 years ago

With a specified node range, multiple graphs may be made with the same structure without them being equal by nodes and edges.

Sorry, this is unclear to me. You mean by specifying separate ranges? Why is this important? Understanding the use cases here would be helpful (step up one level of the what are you trying to do ladder - What is the ultimate need that leads to the request?).

Without specified node range, this may be achieved outside of the function, however, I considered it an added convenience to allow the function to handle this itself, as well as a way to signal to the programmer that these are the values that will populate their graph, and that this is the way for them to control this.

Go idiom differs from python idiom in this regard; we generally avoid multitudinous knobs where python seems to foster their use.

Note that the graph generation API defers all knowledge of node IDs to the graphs themselves.

mattConn commented 3 years ago

For the ultimate need of these functions, I had considered them as starting points or building blocks for a larger graph-oriented program, using common graph structures. Specifying node range would allow them to be more readily interoperable with each other.

Go idiom differs from python idiom in this regard; we generally avoid multitudinous knobs where python seems to foster their use.

This difference in idiom might better inform my opinion on including node range in these functions. I believe you are getting at the idea of giving these functions the smaller set of concerns, as in "make common graph structures" versus "make common graph structures with the intent of joining them". This would, however, leave the question of how to specify a central node for wheels and stars.

kortschak commented 3 years ago

There is relevant discussion in #230.

There are two possible ways forward that I can see. Either the discussion in #230 is reopened to allow for a NewNodeID as described in that issue. This would allow nodes with specific IDs to be added. This allows graph fragments to be constructed and then have functions to perform unions/intersections on these (this is another issue that needs to be addressed - there are node and edge-based operations in the set). The alternative is to perform the construction each time in the destination without copying from a template. In the second case for example if we wanted to create a 5-start connected to another 5-start by a 2-path we would have something like this:

root := gen.Star(dst, nil, 5) // nil specifies that we don't care where the root is.
end := gen.Path(dst, dst.Node(root), 2)
gen.Star(dst, dst.Node(end), 5) // root the star from the end of the path.

The issue that I have with this approach (without the NewNodeID function is that if gen.Star(dst, node, n) is passed a node with an ID that doesn't exist then the returned node ID will not match the node's that was passed in.

Another alternatives include offloading the work onto the creator of the concrete graph type and have them ensure that node IDs match what is needed. This feels pretty unfriendly.

Finally, it is possible to use another identifier to ensure that unions and intersections behave as wanted (this is how the RDF graph types work). This is also unfriendly.

I think that possibly this is arguing that NewNodeID is needed.

@mewmew do you want to add anything here?

kortschak commented 3 years ago

I suspect that the comments https://github.com/gonum/gonum/issues/230#issuecomment-455677945, https://github.com/gonum/gonum/issues/230#issuecomment-455688934, https://github.com/gonum/gonum/issues/230#issuecomment-455812282, https://github.com/gonum/gonum/issues/230#issuecomment-455812560, https://github.com/gonum/gonum/issues/230#issuecomment-455812819, https://github.com/gonum/gonum/issues/230#issuecomment-455812959 and https://github.com/gonum/gonum/issues/230#issuecomment-455832529 maybe the answer.

A formal interface can be added to graph/graphs/gen and the relevant method added to graphs in simple and multi.

type GraphIdentityBuilder interface {
    // NodeWithID returns a Node with the given ID if possible. A
    // nil graph.Node will be returned if no graph.Node exists or can be created.
    // If a non-nil graph.Node is returned that is not already in the graph
    // NodeWithID will return true for new and the graph.Node must be
    // added to the graph before use.
    NodeWithID(id int64) (n graph.Node, new bool)

    graph.Builder
}

(name needs thought).

mewmew commented 3 years ago

I think that possibly this is arguing that NewNodeID is needed.

@mewmew do you want to add anything here?

I think you've covered most aspects of the discussion already, including the relevant comments of #230. We decided to close #230 as there was no direct need or use case for specifying the ID of a node during graph generation (rather the node ID was considered a low-level implementation detail); and resolved to revisit this issue should such a need or use case arise. Given the use case of merging two or more graphs, perhaps now is indeed the time to revisit #230.

As for bike-shedding, the method name NodeWithID seems fine. Wish I had a better suggestion for the interface name GraphIdentityBuilder, as identity makes me think of other associations than node ID.

Also, welcome @mattConn to the Gonum community :) Glad to see you interested in joining the graph development of Gonum!

Cheers, Robin

kortschak commented 3 years ago

Thanks, Robin.

As for bike-shedding ... Wish I had a better suggestion for the interface name GraphIdentityBuilder, as identity makes me think of other associations than node ID.

Yes. It's not an easy name to find, NodeIDGraphBuilder?

mewmew commented 3 years ago

I prefer NodeIDGraphBuilder over GraphIdentityBuilder. Still not perfect, but perhaps good enough.

mattConn commented 3 years ago

I too like NodeIDGraphBuilder. And thank you for the welcome @mewmew! I take it this means the resolution of #230 would allow for the development of this feature?

kortschak commented 3 years ago

Adding the implementations to the concrete types would allow this to be implemented. Addition of the interface to graph/graphs/gen would happen in conjunction with the functions being discussed here. Currently there is nothing in graph/testgraph to test a NodeWithID method, and possibly it shouldn't be added. I need to think about that. By the time these are added, we may have a good name.

kortschak commented 3 years ago

Now that #1584 is merged we can start thinking about signatures here. If the need to specify node set membership is important, I'm beginning to thing either a slice of node IDs or an ID iterator (we don't have these) would be appropriate.

For the slice-based approach I'd think something like this:

func Star(dst NodeIDGraphBuilder, center int64, leaves ...int64)

func Wheel(dst NodeIDGraphBuilder, center int64, cycle ...int64)

func Complete(dst NodeIDGraphBuilder, nodes ...int64)

func Path(dst NodeIDGraphBuilder, nodes ...int64)

This has the unfortunate cost of having to allocate an n-long slice of nodes.

The iterator approach would look similar, but require an integer iterator type. This approach can have implicit ranges, and so doesn't require the O(n) allocation. A similar approach would be to have an n int parameter for the count and a func(int) int64 to give the IDs (i.e. a lightweight iterator).

func Star(dst NodeIDGraphBuilder, center int64, leaves int, ids func(int) int64)

func Wheel(dst NodeIDGraphBuilder, center int64, cycle int, ids func(int) int64)

func Complete(dst NodeIDGraphBuilder, n int, ids func(int) int64)

func Path(dst NodeIDGraphBuilder, n int, ids func(int) int64)

The tree and random tree cases don't fit these approaches, but both can be built from Star.

mattConn commented 3 years ago

I like the lightweight iterator approach, as it solves the space issue while still giving the programmer control over the how they populate the graph.

For stars and wheels, how would the iterator be "aware" of the central node? Say, if the iterator ranges over [a,b] and the center is in this range. Would the iterator skip center when ranging over [a,b]? Or should the graph-making function itself check for this collision? I think the former could make sense, especially when the iterator is a one-off function literal designed just for making this graph.

mattConn commented 3 years ago

Update on central node collision: SetEdge handles it (if it was ever actually an issue). There is just the matter of checking for self-edges. Some code below where the iterator maps [0,leaves] to nodes. NodeWithID has been great to have around.

func Star(dst NodeIDGraphBuilder, center int64, leaves int, ids func(int) int64) {
    c, _ := dst.NodeWithID(center)

    var n graph.Node
    for i := 0; i <= leaves; i++ {
        n, _ = dst.NodeWithID(ids(i))
        if c != n { // guard against self edge
            dst.SetEdge(dst.NewEdge(c, n))
        }
    }

}
kortschak commented 3 years ago

The alternative is to have a sentence in the docs that say, "The ids function must not return an ID matching center." We tend to be pretty brutal; it's better to crash than have an incorrect result. So it would be written:

func Star(dst NodeIDGraphBuilder, center int64, leaves int, ids func(int) int64) {
    c, _ := dst.NodeWithID(center)
    for i := 0; i < leaves; i++ {
        n, _ := dst.NodeWithID(ids(i))
        dst.SetEdge(dst.NewEdge(c, n))
    }
}

Note that your code gives 1 too many leaves.

kortschak commented 3 years ago

Actually, looking at the functions in gen, the following is fine

func Star(dst NodeIDGraphBuilder, center int64, leaves int, ids func(int) int64) error {
    c, _ := dst.NodeWithID(center)
    for i := 0; i < leaves; i++ {
        id := ids(i)
        if id == central {
            return fmt.Errorf("gen: leaf %d ID matches central node ID: %d", i, center)
        }
        n, _ := dst.NodeWithID(id)
        dst.SetEdge(dst.NewEdge(c, n))
    }
    return nil
}

func Wheel(dst NodeIDGraphBuilder, center int64, cycle int, ids func(int) int64) error {
    err := Star(dst, center, cycle, ids)
    if err != nil {
        return err
    }
    for i := 0; i < cycle; i++ {
        u, _ := dst.NodeWithID(ids(i))
        v, _ := dst.NodeWidID(ids((i+1)%cycle))
        dst.SetEdge(dst.NewEdge(u, v))
    }
    return nil
}
mattConn commented 3 years ago

Those look good. Star makes itself useful from the start. It looks like it is left up to the iterator to dodge the center after all, I think that's fair.

If these are final then that would leave

Wheel could then be a call to star followed by a call to cycle.

kortschak commented 3 years ago

Wheel could then be a call to star followed by a call to cycle.

Yes, if Cycle is

func Cycle(dst NodeIDGraphBuilder, cycle int, ids func(int) int64) error {
    for i := 0; i < cycle; i++ {
        uid := ids(i)
        vid := ids((i+1)%cycle)
        if uid == vid {
            return fmt.Errorf("gen: adjacent node IDs equal at %d: %d", i, uid)
        }
        u, _ := dst.NodeWithID(uid)
        v, _ := dst.NodeWithID(vid)
        dst.SetEdge(dst.NewEdge(u, v))
    }
    return nil
}

However, we end up doing more work than necessary in NodeWithID lookups (ignore me sometimes).

The code here does more than it should but the faster code is likely not that much faster and less simple.

The tree functions need more thought for the API.

mattConn commented 3 years ago

Should we put this code in a branch? It's starting to come along.

kortschak commented 3 years ago

SGTM

kortschak commented 3 years ago

A thing to consider is that the current code doesn't make checks for non-bijective behaviour in the `ids func(int) int64) parameter (except for neighbours).

kortschak commented 3 years ago

It's also worth considering what an ID iterator would be defined as

type IDer interface {
    Len() int
    ID(int) int64
}

which maybe nicer than the func approach we have above.

mattConn commented 3 years ago

A thing to consider is that the current code doesn't make checks for non-bijective behaviour in the `ids func(int) int64) parameter (except for neighbours).

Why would the graph function check for this? I can imagine random node generation would present an issue if this is important. To be sure, you do mean a bijection between node ID's and integers?

It's also worth considering what an ID iterator would be defined as ...

The IDer interface does make code a lot cleaner, as evidenced in #1590. You've pretty much got everything in there.

kortschak commented 3 years ago

Why would the graph function check for this?

Programmers make mistakes.

To be sure, you do mean a bijection between node ID's and integers?

Yes.

mattConn commented 3 years ago

Will you be merging #1590?

kortschak commented 3 years ago

Yes, just waiting on an approved reviewer to approve.

ping @gonum/developers

mattConn commented 3 years ago

Thanks for working with me on this, Dan and @mewmew. I’ll close this as #1590 has been merged.

kortschak commented 3 years ago

The tree functions are not there yet, so I'll reopen.

mattConn commented 3 years ago

Mind if I give the tree function a go? Most likely I would use the Star function per your suggestion near the beginning of this issue.

kortschak commented 3 years ago

SGTM

mattConn commented 3 years ago

I forgot to mention that once I finish my review of #1524 I will come back to this.