DeepX-inc / machina

Control section: Deep Reinforcement Learning framework
MIT License
279 stars 45 forks source link

Inappropriate mean in loss_functional with rnn #118

Closed rarilurelo closed 5 years ago

rarilurelo commented 5 years ago

When RNN is used, loss is averaged through (timestep, batchsize). However steps after terminate are masked by output_masks. Episode length must be arranged same length for using RNN.

pol_loss = torch.mean(pol_loss * out_masks)

We have to calculate this like below.

timestep = torch.sum(out_masks, dim=0)
pol_loss = torch.sum(pol_loss * out_masks) / (timestep * batchsize)
rarilurelo commented 5 years ago

198