We propose a framework for solving combinatorial optimization problems of
which the output can be represented as a sequence of input elements. As an
alternative to the Pointer Network, we parameterize a policy by a model based
entirely on (graph) attention layers, and train it efficiently using REINFORCE
with a simple and robust baseline based on a deterministic (greedy) rollout of
the best policy found during training. We significantly improve over
state-of-the-art results for learning algorithms for the 2D Euclidean TSP,
reducing the optimality gap for a single tour construction by more than 75% (to
0.33%) and 50% (to 2.28%) for instances with 20 and 50 nodes respectively.
Kool, W. W. M., Welling, M.
We propose a framework for solving combinatorial optimization problems of which the output can be represented as a sequence of input elements. As an alternative to the Pointer Network, we parameterize a policy by a model based entirely on (graph) attention layers, and train it efficiently using REINFORCE with a simple and robust baseline based on a deterministic (greedy) rollout of the best policy found during training. We significantly improve over state-of-the-art results for learning algorithms for the 2D Euclidean TSP, reducing the optimality gap for a single tour construction by more than 75% (to 0.33%) and 50% (to 2.28%) for instances with 20 and 50 nodes respectively.
https://arxiv.org/abs/1803.08475