fisharebest / webtrees

Online genealogy
https://webtrees.net
GNU General Public License v3.0
427 stars 292 forks source link

Performance improvement: Dijkstra's algorithm #960

Open ric2016 opened 8 years ago

ric2016 commented 8 years ago

When calculating relationships between two individuals, Dijkstra's algorithm is used (via Dijkstra.php).

It calculates the distances from the source node to all other nodes, which is unnecessary: Once the target node is processed in the queue, we know the final shortest distance to the target node. As soon as the next node in the queue has a greater distance(*), we can clear the queue and abort the processing (we're not interested in the actual distances to other nodes).

This makes the relationship calculation noticeably faster in large trees, especially when recursion is used.

(*) In general, if we want to obtain all paths with the shortest distance, rather than just any such path, we cannot abort directly when the target node is reached, because there may be zero-weight edges from other nodes with the same distance, which we still have to process.

In the actual graph used for relationship calculations, there are no zero-weight edges, so this is actually unnecessary, but makes no big difference wrt performance anyway.

ric2016 commented 8 years ago

Just to clarify: This is a general optimization for all uses of the algorithm (whenever the distance to a single target node is to be determined) - independent of the variants discussed in issue #913.

fisharebest commented 2 years ago

This is a general optimization for all uses of the algorithm

I'm not 100% sure it is.

Suppost we want to find the shortest path from A to B.

A--10--B
|      |
1      1
|      |
C---1--D

If we stopped as soon as we processed B, we'd find A-B (10), which is longer than A-C-D-B (3).

I think your shortcut will only work if all paths have the same length. In which case, we need to find a different algorithm.

Perhaps a breadth-first-search?

ric2016 commented 1 year ago

If we stopped as soon as we processed B

We do not stop as soon as we have a distance for B, but as soon as the next node in the queue has a larger distance. the initial A-B (10) is only an upper bound for the target distance.

In more detail: We stop only if the next node in the queue has a greater distance. The queue already implements a kind of breadth-first search in the sense that all (relevant) neighbors of a node are put on the queue. Therefore the final path, if shorter than the one already found, must be through one of the nodes in the queue. Therefore, if all distances of the queue nodes are already larger than the upper bound of the distance to the target node, we won't find a shorter path.

In your example:

First iteration of 'processNextNodeInQueue':

$distance[A] = 0
$distance[B] = INF
$distance[C] = INF
$distance[D] = INF
$maxDistanceToTarget = INF
$queue = [A => 0]

Second iteration of 'processNextNodeInQueue':

$distance[A] = 0
$distance[B] = 10
$distance[C] = 1
$distance[D] = INF
$maxDistanceToTarget = 10
$queue = [C => 1, B => 10]

Check whether to abort: No, $maxDistanceToTarget isn't smaller than distance of next node in queue. Continuing the algorithm, the correct path with distance 3 will be determined.