Closed felixchalumeau closed 3 years ago
As prunning was never the highlight of Constraint Programming, we concluded that no matter the RL-based technique, the ability to select the right nodes will be compensated by the relatively long inference time of our agent, making it not suitable for optimality proofs (that requires no matter what to explore a lot of nodes).
We thus decided to only focus on finding relatively good first solutiosn in a minimum of time.
Finding as fast a possible the best solution and proving optimality is not the same challenge at all and it is very hard for an agent to understand that he should "switch mode". It would hence be very interesting to try working with two agents, one learning to find the best solution asap and one finding the optimality asap.