In line 161 of neural_agent.py (in refine_actions), the cross entropy loss is calculated between the logits and a tensor of ones with length len(actions). Wouldn't it make more sense for the length to be len(action_batch)? In the present case, the code only runs without error if the values for the dqn-eval-batch-size and dqn-rank-size flags are equal.
In line 161 of neural_agent.py (in refine_actions), the cross entropy loss is calculated between the logits and a tensor of ones with length len(actions). Wouldn't it make more sense for the length to be len(action_batch)? In the present case, the code only runs without error if the values for the dqn-eval-batch-size and dqn-rank-size flags are equal.