Closed fedebotu closed 4 months ago
Talking to some, it seems that the naming "Transductive" instead of "Search", since search is too broad in scope and the line is a bit blurred in what each algorithm specifically does. Transductive means "directly optimize the parameters specifically for an instance" which conveys the meaning more easily!
Talking to some, it seems that the naming "Transductive" instead of "Search", since search is too broad in scope and the line is a bit blurred in what each algorithm specifically does. Transductive means "directly optimize the parameters specifically for an instance" which conveys the meaning more easily!
Yep! I remember you mentioned this before, and that was what I used :-)
I noticed doing the metaclasses that NonAutoregressive[...]
things are directly callable. We should modify such that the GNN model belongs to zoo and it will be called from there
A quick abstract look to the current RL4CO structure.
A quick abstract look to the current RL4CO structure.
Nice! Careful though because "Transductive" are RL algorithms to "finetune" policies on specific instances, like EAS
[!IMPORTANT] Thanks for your revisions! We are planning to merge the PR into
main
tomorrow - if you have some additional comments / modification / bugfixes please let us know!
Description
This PR is for a major, long-due refactoring to the RL4CO codebase :smile:
Motivation and Context
So far, we had mostly overfitted RL4CO to the autoregressive Attention Model structure (encoder-decoder). However, there are several models that do not necessarily follow this, such as DeepACO. Implementing such a model requires changes in the structure, which then starts to become non-standardized anymore, and it could be hard for newcomers to implement a different model type. For this reason, some rethinking of the library on the modeling side is necessary!
New structure
With the new structure, the aim is to categorize NCO approaches (which are not necessarily trained with RL!) into the following: 1) constructive, 2) improvement, 3) transductive.
1) Constructive (policy)
1a) Autoregressive (AR)
Autoregressive approaches use a decoder that outputs log probabilities for the current solution. These approaches generate a solution step by step, similar to e.g. LLMs. They have an encoder-decoder structure (i.e. AM). Some models may not have an encoder at all and just re-encode at each step (e.g. BQ-NCO).
1b) NonAutoregressive (NAR)
The difference between AR and NAR approaches is that NAR only use an encoder (they just encode in one shot) and generate for example a heatmap, which can then be decoded simply by using it as a probability distribution or by using some search method on top (e.g. DeepACO).
2) Improvement (policy)
These methods differ w.r.t. constructive NCO since they can obtain better solutions similarly to how local search algorithms work - they can improve the solutions over time. This is different from decoding strategies or similar in constructive methods since these policies are trained for performing improvement operations.
Note: You may have a look here for the basic constructive NCO policy structure! ;)
3) Transductive (model)
Transductive models are learning algorithms that optimize on a specific instance: they improve solutions by updating policy parameters $\theta$_, which means that we are running optimization (backprop) during online testing. Transductive learning can be performed with different policies: for example EAS updates (a part of) AR policies parameters to obtain better solutions, but I guess there are ways (or papers out there I don't know of) that optimize at test time.
In practice, here is what the structure looks right now:
Changelog
embedding_dim
->embed_dim
(see PyTorchenv_name
as a mandatory parameterevaluate
which simply takes in an action if provided and gets it log probsevaluate_action
since it can be simply done via the above!Types of changes
TODO
Extra
policy.encoder
+value_head
(this way any model should be able to have a critic)Special thanks to @LTluttmann for your help and feedback~
Do you have some ideas / feedback on the above PR? CC: @Furffico @henry-yeh @ahottung @bokveizen Also tagging @yining043 for the coming improvement methods