dominikbraun / graph

A library for creating generic graph data structures and modifying, analyzing, and visualizing them.
https://graph.dominikbraun.io
Apache License 2.0
1.77k stars 95 forks source link

Topological sort has a significant performance penalty #170

Open yyjlincoln opened 4 months ago

yyjlincoln commented 4 months ago

I'm using a Graph of type graph.Graph[Resource, Resource] with a hash function that returns the original pointer to build a directed, acylic resource graph. Due to the nature of current implementation, each graph node contains exactly one edge to its parent resource, all the way to a root node.

With 4043 total resources (so v=4043, e=4042 as the root node does not connect to any parent), running inside a Docker on a M2 Pro mac, a Topological Sort using graph.TopologicalSort takes >1.5s (1.876445292s as a typical value), with a significant memory usage of >200M. Upon profiling the heap with pprof, the following is yield:

Memory snapshot for a single request that requires Topological Sorting:

$ go tool pprof mem.pprof
      flat  flat%   sum%        cum   cum%
  332.92MB 96.06% 96.06%   336.85MB 97.19%  github.com/dominikbraun/graph.TopologicalSort[go.shape.struct { Domain string; Instance int64 },go.shape.struct { Domain string; Instance int64 }]

I thought the excessive memory usage might have been caused by the fact that I'm using a struct and basically no hashing (which is, of course, more memory intensive); so I changed to graph.Graph[*Resource, *Resource]. While the below snapshot did not capture the exact memory usage at the moment, the memory allocation from the start of the program after a few request is:

$ go tool pprof --alloc_space mem.pprof
> top1
      flat  flat%   sum%        cum   cum%
  339.98GB 87.45% 87.45%   342.07GB 87.99%  github.com/dominikbraun/graph.TopologicalSort[go.shape.*go.shape.struct { Domain string; Instance int64 },go.shape.*uint8]

...well exceeding 300GB.

Meanwhile, a naive implementation by keeping track of inEdgesCount for each vertex (using adjacentMap provided by graph) takes <20ms, with negligible memory footprint.

Using graph (as "github.com/dominikbraun/graph") v0.23.0.

dominikbraun commented 4 months ago

Thanks for reporting this!