fillipe-gsm / python-tsp

Library to solve Traveling Salesperson Problems with pure Python code
MIT License
174 stars 28 forks source link

solve_tsp_local_search would sometimes return reversed list #62

Open MysteriousShadow opened 1 month ago

MysteriousShadow commented 1 month ago

Just using the examples on the readme here, the exact version solve_tsp_dynamic_programming returns [0, 1, 3, 2] as the permuation.

However, when using the heuristics version solve_tsp_local_search, it would sometimes return [0, 2, 3, 1].

In fact, solve_tsp_local_search can return either [0, 1, 3, 2] or [0, 2, 3, 1]. It seems to be randomly choosing which one to return.

Is this intended? I'd rather the output be deterministic. I have large data sets, and the exact method takes too much time and memory.

fillipe-gsm commented 1 month ago

Hey, @MysteriousShadow ,

So, in this specific example, these two optimal solutions you wrote have the same total distance.

The heuristic methods implemented here have a random component that makes them explore the possible routes in different ways every time you run them. This is intrinsic to these methods, and in fact this is one of the factors that allow them to (typically) get good solutions in reasonable time.

This of it in this way: imagine a mountain with two far-away valleys of the same height, and you wish to find its lowest point. The deterministic methods will start at the same point and follow the same path, so it will always end up in the same valley. The heuristics here, due to their random nature, may start from different points and follow different paths each time, and thus may end up in different valleys.

If you want to get deterministic results, you can do one of the following.

Suggestion 1: Force a specific random seed

This is the same as fixing the same starting point and path:

import random
# Set this before all optimizations
random.seed(1)  # or any other number you prefer

The following figure shows optimizations for the same example you used without setting a seed. We may get different routes as expected, but in this case they always have the same total distance, so both quality as optimal solution.

image

When setting a random specific seed before each optimization, the results are always the same now.

image

Suggestion 2: use a deterministic heuristic

If your problem instances are too large for the exact methods, you may use deterministic heuristics. Typically constructive heuristics fall into this class, such as "nearest neighbors", "christofides" etc.

Unfortunately this library does not have such methods implemented, so you may need to resort to other libraries such as OR-Tools.

With that said, I highly recommend against forcing specific seeds. The random nature is very important for making the heuristics good, and it is possible that you pick a seed that generates a very bad path and you always get a bad solution.

If you still prefer this route, then I suggest going with a multi-seed view: run the optimization for multiple seeds (say 5 or 10) and pick the one with best distance. In this way you get a mix of the benefits of both approaches.