Closed redknightlois closed 5 years ago
If after the restore, you look at the actions selected during the Heatup phase, then that might be the source of the issue. Heatup does not run the agent's choose_action
, by default, and thus does not get into the exploration policy code. Instead it just chooses actions randomly. In order for the Heatup to use the agent's decisions, you should set the flag AgentParameters.algorithm.heatup_using_network_decisions
.
That works.
I wrote my own exploration policy which performs some guided exploration, however on checkpoint restore when I run heatup it doesnt look like the output of either the restored policy nor the guided exploration that I have written. In fact it looks akwardly random.
For reference in case I did something wrong, this is the exploration search code:
And the startup code looks like:
Any idea?