Open IgorAherne opened 1 year ago
Hey, I haven't had a look at the code for a while. Can you reference the part in the paper that made you think it's the max over the taus?
Hi Sebastian,
From page 5 of this paper https://arxiv.org/pdf/1806.06923.pdf These equations are a bit tough for me, but looking at equation 2 and 3 from here:
Hello,
Given that forward() will return tuple:
return out.view(batch_size, num_tau, self.num_actions), taus
Should we use .max(1) instead of .max(2) ? Currently it is:
Q_targets_next = Q_targets_next.detach().max(2)[0].unsqueeze(1) # (batch_size, 1, N)
Maybe should be:Q_targets_next = Q_targets_next.detach().max(1)[0].unsqueeze(1) # (batch_size, 1, numActions)
In other words, to find the maximum in every tau group, rather than across every action? Sorry if I misunderstood the process.