Take a look at https://github.com/david-cortes/contextualbandits. Omar is already taking care of installing the package and requirements in the docker file. Take a look at the example and implement a basic version of the RL env/agent. Most importantly understand how the RL env and agent is set up, which methods are important and how it works. Start with the examples and work your way up from there. The specific method does not matter right now. Define a VERY simple and basic reward function. The focus is on starting the execution of batches of instances from within the RL env and the orchestration of the different components of the prototype.
Take a look at https://github.com/david-cortes/contextualbandits. Omar is already taking care of installing the package and requirements in the docker file. Take a look at the example and implement a basic version of the RL env/agent. Most importantly understand how the RL env and agent is set up, which methods are important and how it works. Start with the examples and work your way up from there. The specific method does not matter right now. Define a VERY simple and basic reward function. The focus is on starting the execution of batches of instances from within the RL env and the orchestration of the different components of the prototype.