DeepX-inc / machina

Control section: Deep Reinforcement Learning framework
MIT License
278 stars 45 forks source link

Dataparallel Option in PPO Cause Error #202

Closed takerfume closed 5 years ago

takerfume commented 5 years ago

I ran python run_ppo.py --data_parallel --cuda 0 and got the following error.

Traceback (most recent call last):
  File "run_ppo.py", line 160, in <module>
    optim_pol=optim_pol, optim_vf=optim_vf, epoch=args.epoch_per_iter, batch_size=args.batch_size, max_grad_norm=args.max_grad_norm)
  File "/home/taketoyoshida/machina/machina/algos/ppo_clip.py", line 128, in train
    clip_param, ent_beta, max_grad_norm)
  File "/home/taketoyoshida/machina/machina/algos/ppo_clip.py", line 38, in update_pol
    pol_loss = lf.pg_clip(pol, batch, clip_param, ent_beta)
  File "/home/taketoyoshida/machina/machina/loss_functional.py", line 49, in pg_clip
    _, _, pd_params = pol(obs, h_masks=h_masks)
  File "/home/taketoyoshida/anaconda3/envs/machina/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/taketoyoshida/machina/machina/pols/gaussian_pol.py", line 66, in forward
    log_std = log_std.expand_as(mean)
RuntimeError: The expanded size of the tensor (1) must match the existing size (3) at non-singleton dimension 1.  Target sizes: [256, 1].  Tensor sizes: [3]
rarilurelo commented 5 years ago

This is because outputted log_std from network's shape is one, but data_parallel is willing to concatenate it. The network should output the same size of log_std.

rarilurelo commented 5 years ago

203