Is it possible to extract the expected reward value accrued up to a specific time step?

troyrock commented 3 years ago

Is it possible to extract the expected value of the reward accrued up to a specific time step? I'm modeling the effect of different drugs on MS patients and would like to be able to extract the accumulated reward (quality adjusted life years) after n time steps (years). Thank you.

Svalorzen commented 3 years ago

In general, you need to handle reward/statistics you want. So when you create a loop to run your environment & policy, you can just make a variable that stores the reward obtained each timestep from the environment.

That said, there are a couple of classes you can use to make the process simpler. One is the Statistics class, which automatically computes average reward per timestep, and average cumulative reward, including standard deviation. A similar one, which you probably don't need and I'm just mentioning to be sure, is the Experience class, which you can use in model-based RL to learn environments, but also keeps track of things like average reward seen per transition.

Svalorzen commented 3 years ago

Ah, one last thing. In case you are doing planning (say, with value iteration), if you plan for n timesteps, then the output value function contains effectively the expected values after n timesteps following the optimal policy, for all states. This does not require to run any experiment.

troyrock commented 3 years ago

Thanks for the response. I was able to get exactly what I wanted out of the system. I really appreciate it.

Svalorzen / AI-Toolbox

Is it possible to extract the expected reward value accrued up to a specific time step? #46