dominikbraun / graph

A library for creating generic graph data structures and modifying, analyzing, and visualizing them.
https://graph.dominikbraun.io
Apache License 2.0
1.8k stars 94 forks source link

Stores of cloned graphs should be decoupled and independent of the original graph's store #113

Closed dominikbraun closed 1 year ago

dominikbraun commented 1 year ago

At the moment, a cloned graph uses the same Store instance as the original graph. Proof:

func TestClonedGraph(t *testing.T) {
    g := graph.New(graph.IntHash)

    _ = g.AddVertex(1)
    _ = g.AddVertex(2)
    _ = g.AddVertex(3)

    h, _ := g.Clone()

    _ = h.AddVertex(4)

    _, err := g.Vertex(4)
    if !errors.Is(err, graph.ErrVertexNotFound) {
        t.Errorf("expected error %v, got %v", graph.ErrVertexNotFound, err)
    }
}

The test will fail because 4 can be found in the original graph g, despite the fact that4 has only been added to h.

dominikbraun commented 1 year ago

The fundamental problem is that while instantiating a new memoryStore store (the default Store implementation) is easy, graph doesn't know how to instantiate custom Store implementations.

I've been thinking about reflect.New(reflect.TypeOf(g.store)) or a similar approach, but this won't work for more complex storages that perhaps require a database setup or something similar. Thus, this isn't the way.

Another approach would be to add a CloneWithStore method where the user passes the ready-to-use Store instance. This would work fine for the users themselves, however, functions that clone the original graph (such as TransitiveReduction) wouldn't have a Store instance available.

A viable solution might be to accept a Store factory function, and invoke that function when needed. But again, for something like databases, this probably wouldn't work either, because a new storage might need some manual and non-idempotent setup.

In short: There is no cut-and-dry solution. Instead of striving for a one-fits-all approach, I'm pursuing an approach where cloned graphs always end up in a new in-memory storage, regardless of the storage of the original graph. The algorithms work faster in memory anyway. I might add a method for transferring the cloned graph returned by an algorithm function to another storage in the future.