N-Wouda / ALNS

Adaptive large neighbourhood search (and more!) in Python.
https://alns.readthedocs.io/en/latest/
MIT License
442 stars 123 forks source link

Integrate with MABWiser library #132

Closed N-Wouda closed 1 year ago

N-Wouda commented 1 year ago

In https://github.com/openjournals/joss-reviews/issues/5028, @skadio wrote:

Inspired by your alpha-UCB policy in ALNS, let me add one potential integration opportunity for future.

Similar to the UCB policy, other Bandit policies can be considered when selecting operators, for instance Thompson Sampling.

In that regard, the MABWiser library that offers a comprehensive set of policies would neatly work as the Selection operator. Disclaimer: the library is built within our group in the AI Center at Fidelity.

The idea is to enable the "select" parameter to be MABWiser object which implements a variety of Bandit policies.

So the current UCB policy

select = AlphaUCB(scores=[5, 2, 1, 0.5], alpha=0.05), num_repair, num_destroy)

becomes

select = AlphaUCB(scores=[5, 2, 1, 0.5], MABWiser(arms=num_repair * num_destroy, LearningPolicy.UCB1(alpha=0.05)

Under the hood, ALNS leverages the expected reward and arm prediction computations from MABWiser.

In principle, the user can select any available bandit policy. In that sense, AlphaUCB becomes BanditALNS(scores, MABWiser)

Btw, it might be worthwhile considering Thompson Sampling (TS) Bandits which is a common counterpart to UCB policies, and often outperforms it.

It would look like this:

select = BanditALNS(scores=[1, 1, 1, 0], MABWiser(arms=num_repair * num_destroy, LearningPolicy.ThompsonSampling())

The reward for TS is binary. Here I used 1 for best, better, and accepted solutions and 0 for rejected solutions. The 0/1 scoring schema for TS would be easier to tune than UCB where it is not clear how to pick reasonable defaults.

Finally, both UCB and LS are non-contextual policies. Moving beyond them, one could also consider contextual policies, where a set of features that describes that current state (neighborhood) help decide the arm selection. I am not aware of any research that considers contextual bandit policies for ALNS --this would be super cool! MABWiser supports all of these, and more.

I think this could be a great future opportunity, so I am opening this issue to not forget about it in the meantime.

skadio commented 1 year ago

Quick update on this direction

Paul, a CS student from Brown University, is about to finalize her senior thesis to enable ALNS + MABWiser integration. For some reason, I cannot tag his git alias here.

We can open a PR for your review soon in a few weeks. Basically, it follows the Operator Selection base class interface and introduces a BanditSelector that can be any bandit policy coming from MABWiser. So UCB is just one of them, there are may others now that would be available to any user that chooses the BanditSelector as the selection schema.

N-Wouda commented 1 year ago

@skadio great, thanks for the update! I'm looking forward to the PR in a few weeks.