PKU-MARL / HARL

Official implementation of HARL algorithms based on PyTorch.
484 stars 59 forks source link

Bug: There isn't reset of the environment when training #46

Closed handleandwheel closed 1 month ago

handleandwheel commented 3 months ago

I noticed that with on policy algorithms, the data collection process is done in the run function in OnPolicyBaseRunner. However, in my experiments, I noticed that my environment would not be reset even if it already gives out done == True. Following this clue, I found out that there isn't a reset procudure in the run function or any functions called by it that handles the problem.

Ivan-Zhong commented 3 months ago

Hi, environments are automatically reset in the step function if done is True. You can take a look at the harl/envs/env_wrappers.py to get familiar with the logic (here and here). Let me know if you have further issues. :)