Closed thirteenflt closed 1 month ago
https://github.com/OpenLLMAI/OpenRLHF/blob/main/openrlhf/models/actor.py#L176 log_probs[:, -num_actions:] is returned. But the seq should contain both left padding(added before generation) and right padding(added after generation). The num_actions is the number of tokens in the output which is ok.
But the problem is that it's not necessarily the last num_actions in the log_probs(seq): log_probs[:, -num_actions:]
Am I getting it wrong?
oh, num_actions is the size of action_mask. I do get it wrong.
https://github.com/OpenLLMAI/OpenRLHF/blob/main/openrlhf/models/actor.py#L176 log_probs[:, -num_actions:] is returned. But the seq should contain both left padding(added before generation) and right padding(added after generation). The num_actions is the number of tokens in the output which is ok.
But the problem is that it's not necessarily the last num_actions in the log_probs(seq): log_probs[:, -num_actions:]
Am I getting it wrong?