astooke / rlpyt

Reinforcement Learning in PyTorch
MIT License
2.22k stars 323 forks source link

AttributeError: 'info' object has no attribute 'timeout' under Customed gym env wrapped by GymEnvWrapper #55

Closed wiekern closed 4 years ago

wiekern commented 4 years ago

Hi, I attempt to switch my developing environment from stable baseline to this fancy Pytorch-based RL framework, but I encounter an erorr below when I am using SAC algorithm.

Traceback (most recent call last): File ".\3axis_11states_pytorch.py", line 119, in <module> build_and_train() File ".\3axis_11states_pytorch.py", line 109, in build_and_train runner.train() File "C:\Users\wiekern\Desktop\IMA\latest_code\rl-agentgen3\03_RL_Agents\02_PyTorch_RL_Implementations\rlpyt\runners\minibatch_rl.py", line 229, in train n_itr = self.startup() File "C:\Users\wiekern\Desktop\IMA\latest_code\rl-agentgen3\03_RL_Agents\02_PyTorch_RL_Implementations\rlpyt\runners\minibatch_rl.py", line 75, in startup rank=rank, File "C:\Users\wiekern\Desktop\IMA\latest_code\rl-agentgen3\03_RL_Agents\02_PyTorch_RL_Implementations\rlpyt\algos\qpg\sac.py", line 80, in initialize self.initialize_replay_buffer(examples, batch_spec) File "C:\Users\wiekern\Desktop\IMA\latest_code\rl-agentgen3\03_RL_Agents\02_PyTorch_RL_Implementations\rlpyt\algos\qpg\sac.py", line 128, in initialize_replay_buffer timeout=examples["env_info"].timeout) AttributeError: 'info' object has no attribute 'timeout' I dig into the source code which shows that SAC requires an attribute "timeout" from env info, but I have no clue how to modify it. it would be great if you could either provide me a code snippet or give me some hints. Thanks!

astooke commented 4 years ago

Hi! Thanks for the question.

In the provided gym env wrapper, there are some lines starting here: https://github.com/astooke/rlpyt/blob/ba824f18f8598521ea4eb9f06e4f9fb129eb599a/rlpyt/envs/gym.py#L21 which determine whether to make the timeout key in the env_info. If you are using bootstrap_timelimit=True in the algorithm, it will expect this field to be present. It needs to know if done=True, whether that is due to reaching the time limit in the env. This is helpful in e.g. HalfCheetah (take a look at the original SAC paper, I think it's in there).

If you don't care about this part of the algorithm, you can just set bootstrap_timelimit=False, and it won't look for timeout in the env_info.

Hope that helps!

wiekern commented 4 years ago

It helps me out, many thanks!