example/demo_A2C_PPO.py中离散的例子报异常

churchillyik commented 1 year ago

执行 python demo_A2C_PPO.py --gpu=0 --drl=0 --env=6 出现异常

File "elegantrl/train/evaluator.py", line 176, in get_cumulative_rewards_and_steps
    tensor_action = tensor_action.argmax(dim=1)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

在这句下断点，并打印变量如下：

(Pdb) l
171         returns = 0.0  # sum of rewards in an episode
172         for steps in range(max_step):
173             tensor_state = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
174             tensor_action = actor(tensor_state)
175             if if_discrete:
176 B->             tensor_action = tensor_action.argmax(dim=1)
177             action = tensor_action.detach().cpu().numpy()[0]  # not need detach(), because using torch.no_grad() outside
178             state, reward, done, _ = env.step(action)
179             returns += reward
180     
181             if if_render:
(Pdb) pp tensor_state
tensor([[ 0.0357, -0.0466,  0.0230, -0.0324]], device='cuda:0')
(Pdb) pp tensor_action
tensor([0], device='cuda:0')
(Pdb) pp actor
ActorDiscretePPO(
  (net): Sequential(
    (0): Linear(in_features=4, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=128, bias=True)
    (3): ReLU()
    (4): Linear(in_features=128, out_features=2, bias=True)
  )
  (soft_max): Softmax(dim=-1)
)

tensor_action只有一个维度，与参数dim=1不符

Yonv1943 commented 1 year ago

这里是我在把 A2C 和 PPO 合并的时候出错了。这两个issue 应该是同一个问题： https://github.com/AI4Finance-Foundation/ElegantRL/issues/306

我会一起更新代码解决它们。谢谢你

churchillyik commented 1 year ago

另外，在elegantrl/train /run.py，Learner进程里面的这句： actions = torch.empty((horizon_len, num_seqs, action_dim), dtype=torch.float32, device=agent.device) 是否应该改为： actions = torch.empty((horizon_len, num_seqs, 1 if if_discrete else action_dim), dtype=torch.float32, device=agent.device) 并在前面加上： if_discrete = args.if_discrete

DeeplearnerAlex commented 2 months ago

这里是我在把 A2C 和 PPO 合并的时候出错了。这两个issue 应该是同一个问题： #306

我会一起更新代码解决它们。谢谢你

哥，还没更新吗

AI4Finance-Foundation / ElegantRL

example/demo_A2C_PPO.py中离散的例子报异常 #309