blakeelias / pandemic_RL

Reinforcement learning for economically optimal pandemic response.
GNU General Public License v3.0
2 stars 1 forks source link

Feature/time varying policy #19

Closed blakeelias closed 3 years ago

blakeelias commented 3 years ago

Policy changes as a function of both time horizon and frequency of interventions possible.

With population 20 and discount factor 0.99 per time step, when interventions happen every 1 time step (i.e. 4 days):

Horizon = 24 time steps (i.e. 72 days): image

Horizon = 48: image

Horizon = 72: image

Horizon = 96 (i.e. 384 days): image

By comparison, when interventions happen every 8 time steps (i.e. once per 32 days):

Horizon = 24: image

Horizon = 48: image

Horizon = 72: image

Horizon = 96: image

blakeelias commented 3 years ago

When discount factor is set to 1.0:

Actions every 1 time step: Horizon 24: image

Horizon 36: image

Horizon 48: image

Horizon 72: image

Horizon 96: image

When actions take place every 8 time steps (i.e. 32 days): Horizon 24: image

Horizon 36: image

Horizon 48: image

Horizon 72: image

Horizon 96: image