I noticed that with on policy algorithms, the data collection process is done in the run function in OnPolicyBaseRunner. However, in my experiments, I noticed that my environment would not be reset even if it already gives out done == True. Following this clue, I found out that there isn't a reset procudure in the run function or any functions called by it that handles the problem.
Hi, environments are automatically reset in the step function if done is True. You can take a look at the harl/envs/env_wrappers.py to get familiar with the logic (here and here). Let me know if you have further issues. :)
I noticed that with on policy algorithms, the data collection process is done in the
run
function inOnPolicyBaseRunner
. However, in my experiments, I noticed that my environment would not be reset even if it already gives outdone == True
. Following this clue, I found out that there isn't a reset procudure in therun
function or any functions called by it that handles the problem.