Svalorzen / AI-Toolbox

A C++ framework for MDPs and POMDPs with Python bindings
GNU General Public License v3.0
647 stars 98 forks source link

Is it possible to extract the expected reward value accrued up to a specific time step? #46

Closed troyrock closed 3 years ago

troyrock commented 3 years ago

Is it possible to extract the expected value of the reward accrued up to a specific time step? I'm modeling the effect of different drugs on MS patients and would like to be able to extract the accumulated reward (quality adjusted life years) after n time steps (years). Thank you.

Svalorzen commented 3 years ago

In general, you need to handle reward/statistics you want. So when you create a loop to run your environment & policy, you can just make a variable that stores the reward obtained each timestep from the environment.

That said, there are a couple of classes you can use to make the process simpler. One is the Statistics class, which automatically computes average reward per timestep, and average cumulative reward, including standard deviation. A similar one, which you probably don't need and I'm just mentioning to be sure, is the Experience class, which you can use in model-based RL to learn environments, but also keeps track of things like average reward seen per transition.

Svalorzen commented 3 years ago

Ah, one last thing. In case you are doing planning (say, with value iteration), if you plan for n timesteps, then the output value function contains effectively the expected values after n timesteps following the optimal policy, for all states. This does not require to run any experiment.

troyrock commented 3 years ago

Thanks for the response. I was able to get exactly what I wanted out of the system. I really appreciate it.