ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)
https://rl4.co
MIT License
455 stars 84 forks source link

[Minor] stochastic prize collecting travelling salesman problem (SPCTSP) feature embedding problem #31

Closed cbhua closed 1 year ago

cbhua commented 1 year ago

Definition of the SPCTSP [1]:

In the SPCTSP, the expected node prize is known upfront, but the real collected prize only becomes known upon visitation.

i.e., for each node, we know the expectation of the prize as $\rho_i$; when the salesman travels to this node, we will generate the real prize by $\rho_i^*\sim\mathrm{Uniform}(0, 2\rho_i)$ and then follow the PCTSP rule to continue.

In this case, different from the PCTSP, the encoder embedding information should contain the observation of all node locations and the expectations of each node.

The attention-learn-to-route seems not to implement this problem's environment. I may need a discussion before the pull request.

[1] Kool, Wouter, Herke Van Hoof, and Max Welling. "Attention, learn to solve routing problems!." arXiv preprint arXiv:1803.08475 (2018).

fedebotu commented 1 year ago

Interesting, I think the SPCTSP may be not top priority in the todo-list, nonetheless good to have. Maybe we can discuss with @alstn12088?

alstn12088 commented 1 year ago

This is an interesting problem, and some guys reported misleading results. Because of the stochasticity, the change is only once. No sampling method or adaptation is available. This is similar to offline model-based (combinatorial) optimization.

fedebotu commented 1 year ago

Closing now as we will be using a version based on Kool et al. 2019. Feel free to reopen if you find some issue!