Open Yonv1943 opened 1 year ago
This fix covers the following.
Agents folder.
agent.last_state.shape == (num_envs, state_dim)
to keep the shape of this tensor consistent.agent.get_obj_critic_per( )
for all algorithms to adapt it to the PER algorithm updates.logprob.shape == (batch_size, )
after summing over the action_dim dimensiontrain folder.
agent.last_state.shape == (num_envs, state_dim)
env folder.
example folder.
if_use_per=True
from demo_DDPG_TD3_SAC.py
to demo_PER_Prioritized Experience Replay.py
unit_test folder.
agent.last_state.shape == (num_envs, state_dim)
)Addition:
states, actions, reawrds
instead of state, action, reward
as the name of tensor.get_returns
to get_cumulative_rewards
After updating the vectorized env and the corresponding multiprocessing training module, support for the PER algorithm has been affected.
Corresponding Pull Request: https://github.com/AI4Finance-Foundation/ElegantRL/pull/269
The related issue is as follows, PER appears nan because multiprocessing is not adapted to PER.
With multiprocessing, there are
num_envs * num_workers
parallel subenvironments available for a learner to learn. The way I used before, I had a PER sumTree(Binary search tree) with so many subenvironments producing trajectory at the same time, which led to bugs.After modification, I let each subenvironment output trajectory correspond to one sumTree(Binary search tree). This solves the bug and reduces the size of each sumTree