Closed deweydb closed 11 months ago
Hey, @deweydb, I am aware of this limitation and the problem is likely the very large resulting distance matrix, with shape 41,000 x 41,000. To be honest, when you get to problems of this size if you are really hoping for good solutions then you may need to start looking for more specialized commercial solvers (like LKH-3 and Concorde). This one is written in pure Python and even without the out of memory issue you may start feeling its limitations.
With that said, I have three suggestions (in increasing order of labor) that you can look into. Please brace yourself as this will likely be a long read :)
If the problem is the computer crashes due to lack of memory, the most straightforward suggestion is to increase this resource. And don't worry, I am not asking you to spend money and buy another computer or more RAM; you can probably get away with creating a swap file.
They are not persistent like swap partitions (although they can be if you wish), and you can emulate a very large RAM of say 60GB as long as you have enough disk space. I have used this approach before when processing OpenStreetMaps data that required over 40GB or RAM and I had only 8GB available and it worked, so it may be a quick first attempt to your problem.
Note: the guide I linked is for Linux-based distros (and probably BSDs). If you are on a Windows or a Mac you can look for something similar. Note 2: A swap file is much slower than regular RAM. And this, combined with the huge size of your problem, may require you to let the solver run for quite some time to produce reasonable results.
Suggesting to use another tool instead might not sound like one of the best suggestions, but maybe for your large problem another solver may be more appropriate. For instance,
tsp-solver2
. Since it only handles symmetric matrices, you only need the lower triangular portion, so the RAM requirements can be reduced. Note: in all cases the setup to use the solvers are different from this one (specially OR-Tools, which in my opinion is overly verbose).
If none of the previous approaches worked, there is one last heuristic I saw in large scale vehicle routing problems (but translate well to TSP) that should work for any solver, including this one.
Say you have your points as below (they are all connected, but the edges are not shown to prevent clutter):
If the number of points is too large, instead of solving the TSP for the entire dataset, you can split them into smaller groups or clusters:
With that, you start by solving a TSP between the clusters, which should be a very small problem and depending on the number of clusters you can even use an exact algorithm. To determine the inter-cluster distance matrix, the first step here is to determine the input/output for each cluster. There are multiple ways of doing that. One possibility is to find, for each cluster, the point whose maximum distance to each other cluster is the minimum (so a min-max approach).
And then, you solve multiple TSP sub-problems for each cluster. They are supposed to be much smaller, so you wouldn't face computer crashes anymore. In each case, you use the input/output point determined previously as starting node, so for each cluster you could just reorder the points.
Finally, all you need to do is concatenate the paths and remove the node repetitions, and you get the full path for the original problem as shown below:
This was just a high-level description of the solution. The actual code would take a little work but I can try to help you if you'd like.
A few notes:
See if any of this is useful for you.
Thank you very much this was extremely helpful and I greatly appreciate that time and effort you put into sharing all this information with me.
Since there is no action item to be performed here, I will close this issue.
I tried to run this with 41,000 data points and my computer ran out of ram and crashed. Is there any advice you can share on how to run this with bigger datasets?