Closed rarilurelo closed 5 years ago
When RNN is used, loss is averaged through (timestep, batchsize). However steps after terminate are masked by output_masks. Episode length must be arranged same length for using RNN.
pol_loss = torch.mean(pol_loss * out_masks)
We have to calculate this like below.
timestep = torch.sum(out_masks, dim=0) pol_loss = torch.sum(pol_loss * out_masks) / (timestep * batchsize)
When RNN is used, loss is averaged through (timestep, batchsize). However steps after terminate are masked by output_masks. Episode length must be arranged same length for using RNN.
We have to calculate this like below.