SheepTester-forks / CurricularAnalytics.py

CurricularAnalytics.py is a Python port of CurricularAnalytics.jl, a toolbox for studying and analyzing academic program curricula.
https://pypi.org/project/curricularanalytics/
GNU Affero General Public License v3.0
3 stars 3 forks source link

all_paths optimizations #6

Closed SheepTester closed 1 year ago

SheepTester commented 1 year ago
    def _centralities(self) -> List[int]:
        return [
            sum(
                len(path)
                for path in all_paths(self.graph)
                # conditions: path length is greater than 2, target course must be in the path, the target vertex
                # cannot be the first or last vertex in the path
                if (i in path and len(path) > 2 and path[0] != i and path[-1] != i)
            )
            for i in range(len(self.courses))
        ]

This recalculates all_paths(self.graph) for every vertex in the graph, even though it should be the same for each of them.

Metric Time
Blocking factor 1.5215 s
Delay factor 1.7919 s
Complexity 4.27298 s
Centrality 64.6228 s
Extraneous requisites 64.5413 s

I didn't measure longest paths in the same test, but in a separate test,

Metric % of time
Curriculum construction 2.2
Longest paths 2.9
Extraneous requisites 6.2
Total complexity 2.2
Basic metrics 86.3
SheepTester commented 1 year ago

extraneous_requisites

nx.find_cycle(self.graph) takes 14% of its execution time. Can we skip this check? Does nx.weakly_connected_components throw an error if the graph is cyclical?

Queue also appears slow, but this might just be a microoptimization.

Line Time per hit
queue: Queue[int] = Queue() 10.4
queue.put(neighbor) 3.1
x = queue.get() 3.1
self.graph.neighbors(x) (reference) 0.7

In addition to nx.weakly_connected_components, nx.has_path is also relatively slow (43.1% of time, 11.2e-6 s per hit). But while a better graph algorithm could be devised, I think it's fine if I leave this alone for now.

longest_paths, all_paths

25.8% and 36.9% of the time is also spent on nx.find_cycle(self.graph) in longest_paths and all_paths, respectively.

Like extraneous_requisites, Queue is also used in all_paths and is relatively slow.

g.in_edges(x[0]) and g.in_degree(u) are relatively slow as well, but that might be inevitable.

Suggested changes

  1. Don't check for cycles and instead assume the graph is acyclical. Document this assumption.
  2. Use deque instead of Queue
  3. Cache the result of all_paths(self.graph)