Engelberg / ubergraph

An all-purpose Clojure graph data structure that implements Loom protocols and more.
584 stars 33 forks source link

Add a parameter to compose cost fn #64

Open souenzzo opened 1 year ago

souenzzo commented 1 year ago

costs are always composed with +

i want to compose them with *

It should be exposed in ubergraph.alg/shortest-path by adding an extra attr to search-specification:

:compose-cost-fn (fn [current-cost new-cost] => next-cost) - Calculates the next cost, given the current cost and the node cost. Defaults to clojure.core/+

https://github.com/Engelberg/ubergraph/blob/v0.8.1/src/ubergraph/alg.clj#L232

Engelberg commented 1 year ago

Can you give me an example of a problem that benefits from this?

One potential concern is if any weights are numbers from 0 to 1, the shortest-path algorithm wouldn't find the shortest path.

souenzzo commented 1 year ago

Hello @Engelberg. Thanks for the quick response

The use case

I have a graph where each edge has a "probability" and I want to find the path with the highest probability. Where the probabilities are numbers between 0.0-1.0 and the probability of 0.5 with another 0.5 is 0.25.

Current solution

I end up fixing it with math: cost-fn returns (- (Math/log probability)). So the probability of 0.5 is 0.3. Composing -log(0.5) with -log(0.5) results is 0.6 by sum

I'm not sure if this solution is totally right, but it is working for our use-case.

Final solution

If we decide to use compose-cost-fn, we will need to use it on initial state too. possibly follow the clojure convention where the operator with no parameters returns the identity value (+) => 0 (*) => 1 https://github.com/Engelberg/ubergraph/blob/v0.8.1/src/ubergraph/alg.clj#L297-L299

Engelberg commented 1 year ago

Your probability use-case is interesting, and your solution of doing the additive inverse of the log is a clever way to transform a "find the greatest product" problem into "find the smallest sum" problem.

If you had compose-cost-fn of *, I don't think it would help here, at least not directly.

First, the shortest-path algorithm would be trying to find the smallest product, not the greatest. Second, the algorithm wouldn't even succeed at finding the smallest product because each successive edge in the graph is lowering the cost, and this algorithm assumes the total cost of the path stays the same or increases with each edge. (So, for example, with additive costs, negative numbers won't work and currently the library catches that and notifies the user they should be using Bellman-Ford instead, which can handle negative weights).

So, even if * were available to you, you'd still have to do some sort of transformation, like taking 1/cost for each edge, so that the numbers are all above 1 and you're minimizing the cost. And if you have to do a transformation anyway, why not do the transformation you've chosen?

The negative log solution also has another advantage: multiplying lots of probabilities will quickly tend towards 0, making it difficult to compare two long paths accurately due to limitations of floating point numbers, whereas the logs should put them into a range where addition is more stable.

I think your current solution may well be the best solution.