Currently, benchmarks are randomly selected during the training process. However, starting with small and easy benchmarks and then slowly transitioning to larger and more complex ones seems to be promising and is referred to as curriculum learning in the literature.
What's the problem this feature will solve?
Currently, benchmarks are randomly selected during the training process. However, starting with small and easy benchmarks and then slowly transitioning to larger and more complex ones seems to be promising and is referred to as curriculum learning in the literature.
IBM apparently also a great success with that in their AI transpiler (see https://arxiv.org/abs/2405.13196).
This could also further be applied to the action space to start with rather simple ones first and later add more complex ones.
Describe the solution you'd like
Implementation of curriculum learning in the RL part.