The '* self.quantile_weight' component seems to be not really relevant here because doing the multiplication will not change the relative order of each action value, therefore no matter it is multiplied or not, all action values stay the same order.
Since quantiles_next has dim=-1 softmaxed (right?), therefore all the actions in this line will have the same values. Just wondering if this is correct.
https://github.com/ShangtongZhang/DeepRL/blob/e427e8f73f7d6c6ae0283a7a4438b724725ec192/agent/QuantileRegressionDQN_agent.py#L67-L68
Just a few quick questions...:
The '* self.quantile_weight' component seems to be not really relevant here because doing the multiplication will not change the relative order of each action value, therefore no matter it is multiplied or not, all action values stay the same order.
Since quantiles_next has dim=-1 softmaxed (right?), therefore all the actions in this line will have the same values. Just wondering if this is correct.
Thanks for your codes.