Why

user of pyCMO

to be able to specify different reward models for my scenarios

I can train RL agents

Acceptance Criteria

we currently only export the player's side's total score as the reward

we implement a way for users to specify a reward model

we get closer to being able to train RL agents

One idea is to create a custom RewardHandler class that gets passed into CMOEnv that can calculate the reward based on the current observation