Closed yufansong closed 5 years ago
I mean you only have
locations = torch.zeros(opt.num_steps, batch_size, 2)
but without
locations[step, :] = location
so this part will have bug
loc_loss += (torch.abs(offset_norm[j][0].cpu() - locations[i][j][0]) + torch.abs(offset_norm[j][1].cpu() - locations[i][j][1]))
For the policy loss part, I agree with you. I think the accumulated reward is too small, so the R is always lower than the state value.
I mean you only have
locations = torch.zeros(opt.num_steps, batch_size, 2)
but without
locations[step, :] = location
so this part will have bug
loc_loss += (torch.abs(offset_norm[j][0].cpu() - locations[i][j][0]) + torch.abs(offset_norm[j][1].cpu() - locations[i][j][1]))
oh~ your are right~ I will fix this bug~
Also, the predict IoU should be fixed.
Predict_IoUs[step, :] = tIoU
Your reply is so quick. I just was trying to pull the request to fix those two bugs, but you have already fixed one of it.
Your reply is so quick. I just was trying to pull the request to fix those two bugs, but you have already fixed one of it.
Thanks for your suggestions~ you can pull the request if you find other bugs~ Thanks~
In the code, you didn't add Location/IoU loss, what you do is just create the variables without assigning any value. Actually, if I try to revise the loss function into correct code. I find the performance will decrease largely.
The policy loss function is always negative and what we want to do is minimum this loss rather than make it always increase. I was wondering the reason is reward value is always lower than state value(According to the A2C formula) or this is the bug of the code.