ai4co / rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)
https://rl4.co
MIT License
381 stars 70 forks source link

Major modeling refactoring #165

Closed fedebotu closed 4 months ago

fedebotu commented 4 months ago

Description

This PR is for a major, long-due refactoring to the RL4CO codebase :smile:

Motivation and Context

So far, we had mostly overfitted RL4CO to the autoregressive Attention Model structure (encoder-decoder). However, there are several models that do not necessarily follow this, such as DeepACO. Implementing such a model requires changes in the structure, which then starts to become non-standardized anymore, and it could be hard for newcomers to implement a different model type. For this reason, some rethinking of the library on the modeling side is necessary!

[!TIP] Note that in RL4CO we refer to model as the RL algorithm and policy as the neural network that given an instance gives back a sequence of actions $\pi_0, \pi_1, \dots, \pi_N$., i.e. the solution. In other words: model is a LightningModule that trains the policy which is a nn.Module.

New structure

With the new structure, the aim is to categorize NCO approaches (which are not necessarily trained with RL!) into the following: 1) constructive, 2) improvement, 3) transductive.


1) Constructive (policy)

1a) Autoregressive (AR)

Autoregressive approaches use a decoder that outputs log probabilities for the current solution. These approaches generate a solution step by step, similar to e.g. LLMs. They have an encoder-decoder structure (i.e. AM). Some models may not have an encoder at all and just re-encode at each step (e.g. BQ-NCO).

1b) NonAutoregressive (NAR)

The difference between AR and NAR approaches is that NAR only use an encoder (they just encode in one shot) and generate for example a heatmap, which can then be decoded simply by using it as a probability distribution or by using some search method on top (e.g. DeepACO).


2) Improvement (policy)

These methods differ w.r.t. constructive NCO since they can obtain better solutions similarly to how local search algorithms work - they can improve the solutions over time. This is different from decoding strategies or similar in constructive methods since these policies are trained for performing improvement operations.

Note: You may have a look here for the basic constructive NCO policy structure! ;)


3) Transductive (model)

[!TIP] Read the definition of inductive vs transductive RL. In inductive RL, we train to generalize to new instances. In transductive RL we train (or finetune) to solve only specific ones.

Transductive models are learning algorithms that optimize on a specific instance: they improve solutions by updating policy parameters $\theta$_, which means that we are running optimization (backprop) during online testing. Transductive learning can be performed with different policies: for example EAS updates (a part of) AR policies parameters to obtain better solutions, but I guess there are ways (or papers out there I don't know of) that optimize at test time.


Category Input Output Description
Constructive Instance Solution Amortized policy generates solutions from scratch. Can be categorized into Autoregressive (AR) and NonAutoregressive (NAR) approaches.
Improvement Instance, Current Solution Improved Solution Policies trained to improve existing solutions iteratively, akin to local search algorithms. Different from constructive methods as they focus on refining solutions rather than generating them from scratch.
Transductive Instance, (Parameters) Solution, (Updated Parameters) Updates policy parameters during online testing to improve solutions. Can utilize various policies for optimization, such as EAS updates for AR policies.

In practice, here is what the structure looks right now:

rl4co/
└── models/
    ├── common/
    │   ├── constructive/
    │   │   ├── base.py 
    │   │   ├── autoregressive/
    │   │   │   ├── encoder.py
    │   │   │   ├── decoder.py
    │   │   │   └── policy.py
    │   │   └── nonautoregressive/
    │   │       ├── encoder.py
    │   │       ├── decoder.py
    │   │       └── policy.py
    │   ├── improvement/
    │   │   └── base.py # TBD
    │   └── transductive/
    │       └── base.py
    ├── nn # generic neural network
    ├── rl # generic RL models
    └── zoo # literature

Changelog

Types of changes

TODO

Extra


Special thanks to @LTluttmann for your help and feedback~

Do you have some ideas / feedback on the above PR? CC: @Furffico @henry-yeh @ahottung @bokveizen Also tagging @yining043 for the coming improvement methods

fedebotu commented 4 months ago

Talking to some, it seems that the naming "Transductive" instead of "Search", since search is too broad in scope and the line is a bit blurred in what each algorithm specifically does. Transductive means "directly optimize the parameters specifically for an instance" which conveys the meaning more easily!

bokveizen commented 4 months ago

Talking to some, it seems that the naming "Transductive" instead of "Search", since search is too broad in scope and the line is a bit blurred in what each algorithm specifically does. Transductive means "directly optimize the parameters specifically for an instance" which conveys the meaning more easily!

Yep! I remember you mentioned this before, and that was what I used :-)

fedebotu commented 4 months ago

I noticed doing the metaclasses that NonAutoregressive[...] things are directly callable. We should modify such that the GNN model belongs to zoo and it will be called from there

cbhua commented 4 months ago

A quick abstract look to the current RL4CO structure. rl4co_quick_look

fedebotu commented 4 months ago

A quick abstract look to the current RL4CO structure. rl4co_quick_look

Nice! Careful though because "Transductive" are RL algorithms to "finetune" policies on specific instances, like EAS

fedebotu commented 4 months ago

[!IMPORTANT] Thanks for your revisions! We are planning to merge the PR into main tomorrow - if you have some additional comments / modification / bugfixes please let us know!